lifebit-ai / nf-rnaseq-salmon

MIT License
0 stars 0 forks source link

Error with the se.r merge script #3

Closed manu-lifebit closed 2 years ago

manu-lifebit commented 2 years ago

When we use the original additional fasta file (https://github.com/nf-core/test-datasets/raw/rnaseq/reference/gfp.fa) to run the pipeline with the salmon option I got an error after salmon generates the read count files. The error appears in the SALMON_MERGE process when running the se.r script, specifically in this loop https://github.com/lifebit-ai/nf-rnaseq-salmon/blob/f69b962347527d54e40ccf9bab53103d69707820/bin/se.r#L25-L29 There are two possible reasons for this.

image image

1- The error only occurs in the transcript read count merge. It seems that in this case the script detects that the transcript files doesn't contain more ids than the gene annotations and considers that the transcript file is a gene file, removing the first column id and producing the error.

2- The additional fasta file included contains the sequence of the gfp gene/transcript. The name of this gene is slightly different from the name present in the other ref files (Gfp_transgene vs Gfp_transgene_gene). Not sure about the importance of this. I cloned this red file with the correct name and the error disappears, but it seems due to the fact that the script finds some id inconsistency and ignores this sequence. This error shouldn't be present in new executions with other reference files but it should be taken into account.

manu-lifebit commented 2 years ago

This error using the test files is caused by different factors and involves some serendipity:

manu-lifebit commented 2 years ago

Error addressed in https://github.com/lifebit-ai/nf-rnaseq-salmon/pull/6 Issue closed