sagnikbanerjee15 / Finder

A fully automated gene annotator from RNA-Seq expression data
MIT License
55 stars 14 forks source link

Error during STAR alignment #28

Closed bshrestha0 closed 2 years ago

bshrestha0 commented 3 years ago

Hi there,

I am getting an error while using Finder with RNA-Seq data available in my local directory. Here's the error:

cat: /elegans/Finder/FINDER_elegans/alignments/SRR9265068_round3_SJ.out.tab: No such file or directory
cat: /elegans/Finder/FINDER_elegans/alignments/SRR1741331_round3_SJ.out.tab: No such file or directory

Here's the output (partial) generated in the alignments directory:

-rw-r--r--  1 bshrestha ebpproject 186K Oct 12 11:22 SRR9265068_final.sortedByCoord.out.bam.csi
-rw-r--r--  1 bshrestha ebpproject 356K Oct 12 11:21 SRR9265068_final.sortedByCoord.out.bam.bai
-rw-r--r--  1 bshrestha ebpproject 3.1M Oct 12 11:20 not_available_round1_and_round2_and_round3_SJ.out.tab
-rw-r--r--  1 bshrestha ebpproject    0 Oct 12 11:20 not_available_round3_SJ.out.tab
drwx------  3 bshrestha ebpproject  512 Oct 12 11:20 SRR9265068_final__STARtmp
-rw-r--r--  1 bshrestha ebpproject  239 Oct 12 11:20 SRR9265068_relaxed.output
-rw-r--r--  1 bshrestha ebpproject 2.0K Oct 12 11:20 SRR9265068_round3_Log.final.out
-rw-r--r--  1 bshrestha ebpproject 4.8G Oct 12 11:20 SRR9265068_final.sortedByCoord.out.bam
-rw-r--r--  1 bshrestha ebpproject 968M Oct 12 11:17 SRR9265068_round3_Unmapped.out.mate2
-rw-r--r--  1 bshrestha ebpproject 968M Oct 12 11:17 SRR9265068_round3_Unmapped.out.mate1
-rw-r--r--  1 bshrestha ebpproject 4.2M Oct 12 11:17 SRR9265068_final_SJ.out.tab
-rw-r--r--  1 bshrestha ebpproject    0 Oct 12 11:05 SRR9265068_relaxed.error
-rw-r--r--  1 bshrestha ebpproject 3.1M Oct 12 11:05 not_available_round1_and_round2_SJ.out.tab
-rw-r--r--  1 bshrestha ebpproject    0 Oct 12 11:05 not_available_round2_SJ.out.tab
drwx------  3 bshrestha ebpproject  512 Oct 12 11:05 SRR9265068_round2__STARtmp
-rw-r--r--  1 bshrestha ebpproject 2.2G Oct 12 11:05 SRR9265068_round2_Unmapped.out.mate2
-rw-r--r--  1 bshrestha ebpproject  305 Oct 12 11:04 SRR9265068_round2.output
-rw-r--r--  1 bshrestha ebpproject 2.2G Oct 12 11:04 SRR9265068_round2_Unmapped.out.mate1
-rw-r--r--  1 bshrestha ebpproject 2.0K Oct 12 11:04 SRR9265068_round2_Log.final.out
-rw-r--r--  1 bshrestha ebpproject 206M Oct 12 11:04 SRR9265068_round2_Aligned.sortedByCoord.out.bam
-rw-r--r--  1 bshrestha ebpproject 2.1M Oct 12 11:04 SRR9265068_round2_SJ.out.tab
drwx------  2 bshrestha ebpproject 1.0K Oct 12 10:57 SRR9265068_round2__STARgenome
-rw-r--r--  1 bshrestha ebpproject    0 Oct 12 10:57 SRR9265068_round2.error
-rw-r--r--  1 bshrestha ebpproject 3.1M Oct 12 10:57 not_available_round1_SJ.out.tab
drwx------  3 bshrestha ebpproject  512 Oct 12 10:57 SRR9265068_round1__STARtmp
-rw-r--r--  1 bshrestha ebpproject  239 Oct 12 10:57 SRR9265068_round1.output
-rw-r--r--  1 bshrestha ebpproject 2.0K Oct 12 10:57 SRR9265068_round1_Log.final.out
-rw-r--r--  1 bshrestha ebpproject 3.8G Oct 12 10:57 SRR9265068_round1_Aligned.sortedByCoord.out.bam
-rw-r--r--  1 bshrestha ebpproject 2.6G Oct 12 10:55 SRR9265068_round1_Unmapped.out.mate2
-rw-r--r--  1 bshrestha ebpproject 2.6G Oct 12 10:55 SRR9265068_round1_Unmapped.out.mate1
-rw-r--r--  1 bshrestha ebpproject 3.9M Oct 12 10:54 SRR9265068_round1_SJ.out.tab
-rw-r--r--  1 bshrestha ebpproject    0 Oct 12 10:42 SRR9265068_round1.error

As you can see there's no SRR9265068_round3_SJ.out.tab file create during the third round but it created SRR9265068_final_SJ.out.tab file. However, for some libraries it created "round3_SJ.out.tab" files after completing the third run as shown below:

-rw-r--r--  1 bshrestha ebpproject 2.8M Oct 12 12:09 whole_worm_stress_round1_and_round2_and_round3_SJ.out.tab
drwxr-xr-x 14 bshrestha ebpproject  47K Oct 12 12:09 .
-rw-r--r--  1 bshrestha ebpproject 3.5K Oct 12 12:09 whole_worm_stress_round3_SJ.out.tab
drwx------  3 bshrestha ebpproject  512 Oct 12 12:09 SRR14458419_round3__STARtmp
-rw-r--r--  1 bshrestha ebpproject  239 Oct 12 12:09 SRR14458419_round3.output
-rw-r--r--  1 bshrestha ebpproject 431M Oct 12 12:09 SRR14458419_round3_Unmapped.out.mate2
-rw-r--r--  1 bshrestha ebpproject 431M Oct 12 12:09 SRR14458419_round3_Unmapped.out.mate1
-rw-r--r--  1 bshrestha ebpproject 2.0K Oct 12 12:09 SRR14458419_round3_Log.final.out
-rw-r--r--  1 bshrestha ebpproject 1.9M Oct 12 12:09 SRR14458419_round3_Aligned.sortedByCoord.out.bam
-rw-r--r--  1 bshrestha ebpproject 3.6K Oct 12 12:09 SRR14458419_round3_SJ.out.tab
-rw-r--r--  1 bshrestha ebpproject    0 Oct 12 11:57 SRR14458419_round3.error

I also tried using Finder to download SRA files but it didn't do a good job in downloading files properly. Please see the attachment. So, I downloaded the RNA reads in my local computer and used it as an input.

Any suggestions on how to fix this?

Thank you

Screen Shot 2021-10-11 at 10 22 03 PM

sagnikbanerjee15 commented 3 years ago

Hello @bshrestha0,

Thank you very much for your interest in our software finder. It should have produced all the *tab files for all the RNA-Seq samples. I need to inspect it further. Could you please send me the metadata csv along with the genome file that you are trying to annotate? Also, I am not sure why the downloading of the data from NCBI is behaving erratically. I can see that the sample SRR14458419 does not have a correctly named file. Could you tell me what command you used to download the files?

Thank you.

bshrestha0 commented 3 years ago

Hi Sagnik,

Here's my metadata csv file:

BioProject,SRA Accession,Tissues,Description,Date,Read Length (bp),Ended,RNA Seq,process,Location
PRJNA548230,SRR9265068,not_available,cDNA;Illumina HiSeq 4000,3/15/17,150,PE,1,1,RNA_reads
PRJNA271608,SRR1741331,whole_worm,cDNA;Illumina HiSeq 2000,3/15/17,150,PE,1,1,RNA_reads
PRJNA727816,SRR14458419,whole_worm_stress,cDNA;DNBSEQ-G400,3/15/17,150,PE,1,1,RNA_reads
PRJNA215361,SRR953130,embryo,cDNA;Illumina HiSeq 2000,3/15/17,150,PE,1,1,RNA_reads
PRJNA215361,SRR953118,L2-L3,cDNA;Illumina HiSeq 2000,3/15/17,150,PE,1,1,RNA_reads
PRJNA574273,SRR10189241,adults_embryo,cDNA;Illumina HiSeq 2500,3/15/17,150,PE,1,1,RNA_reads
PRJNA215361,SRR953117,L1,cDNA;Illumina HiSeq 2000,3/15/17,150,PE,1,1,RNA_reads

The description and date in the metadata file may not be accurate but I guess it shouldn't matter. "RNA_reads" is the folder where I downloaded the SRA reads. When I used the Finder to download the data, I removed "RNA_reads" from the metadata csv file but kept the rest as it is. The genome file that I used is C. elegans softmasked genome available at the EnsemblGenome as a test run. I can email you the genome file if you could share me your id. Command that I used for the run:

finder -no_cleanup -mf elegans_metadata.csv -n 18  -gdir_star $PWD/star_index_without_transcriptome \
        -out_dir $PWD/FINDER_elegans -g $genome/elegans_genome_sm-filtered.fa \
        -p $PWD/ensembl_elegans.pep -gdir_olego $olego/olego_index -preserve -pc_clean

To download the SRA files, I used fastq-dump from sratoolkit in a loop:

while read LINE
do
        let count++
        fastq-dump --defline-seq '@$sn[_$rn]/$ri' --split-files $LINE
        echo "$LINE"

done <  $FILENAME

$FILENAME has the SRR id of the libraries that I want to download.

Thanks,

Bikash

sagnikbanerjee15 commented 3 years ago

Hi @bshrestha0,

Thank you for sending me the files. I will investigate further. Could you please send me the genome file elegans_genome_sm-filtered.fa and the proteins ensembl_elegans.pep? You could email those to sagnikbanerjee15@gmail.com

Thanks.

Maxim-Karpov commented 1 year ago

Hello, I've encountered a similar issue. Have you been able to find the reason behind it and a solution?

sagnikbanerjee15 commented 1 year ago

Hi @Maxim-Karpov,

Thank you for your patience. I found an issue with the code. The new version of finder will take care of this.

Thank you.