sagnikbanerjee15 / Finder

A fully automated gene annotator from RNA-Seq expression data
MIT License
51 stars 14 forks source link

Issues during checkpoints 2/3 #71

Open Maxim-Karpov opened 1 year ago

Maxim-Karpov commented 1 year ago

Hello, I've encountered 2 different issues when running Finder on 2 separate genomes.

1) INFO: Creating SIF file... /softwares/FINDER/Finder/scripts/fixOverlappingAndMergedTranscripts.py:350: VisibleDeprecationWarning: Creating an ndarray from ragged nested sequences (which is a list-or-tuple of lists-or-tuples-or ndarrays with different lengths or shapes) is deprecated. If you meant to do this, you must specify 'dtype=object' when creating the ndarray. coverage_info[transcript_id]["bed_cov"] = np.array( temp ) /softwares/FINDER/Finder/scripts/fixOverlappingAndMergedTranscripts.py:656: RuntimeWarning: invalid value encountered in double_scalars ratio2 = round( np.average( coverage_2nd_portion ) / np.average( coverage_3rd_portion ), 2 ) /softwares/FINDER/Finder/scripts/fixOverlappingAndMergedTranscripts.py:657: RuntimeWarning: invalid value encountered in double_scalars ratio3 = round( np.average( coverage_3rd_portion ) / np.average( coverage_2nd_portion ), 2 ) /softwares/FINDER/Finder/scripts/fixOverlappingAndMergedTranscripts.py:657: RuntimeWarning: divide by zero encountered in double_scalars ratio3 = round( np.average( coverage_3rd_portion ) / np.average( coverage_2nd_portion ), 2 ) /softwares/FINDER/Finder/scripts/fixOverlappingAndMergedTranscripts.py:656: RuntimeWarning: divide by zero encountered in double_scalars ratio2 = round( np.average( coverage_2nd_portion ) / np.average( coverage_3rd_portion ), 2 ) Traceback (most recent call last): File "/softwares/FINDER/Finder/finder", line 688, in main() File "/softwares/FINDER/Finder/finder", line 649, in main orchestrateGeneModelPrediction( options, logger_proxy, logging_mutex ) File "/softwares/FINDER/Finder/finder", line 491, in orchestrateGeneModelPrediction fixOverlappingAndMergedTranscripts( options, logger_proxy, logging_mutex ) File "/softwares/FINDER/Finder/scripts/fixOverlappingAndMergedTranscripts.py", line 740, in fixOverlappingAndMergedTranscripts exon = list( map( int, exon.split( "-" ) ) ) ValueError: invalid literal for int() with base 10: '1e+05'

I believe the line 740 in fixOverlappingAndMergedTranscripts.py needs to be changed from exon = list( map( int, exon.split( "-" ) ) ) to exon = list( map( int, map(float, exon.split( "-" )) ) ) to fix this.

2) INFO: Creating SIF file... cat: /home/Maxim/software/FINDER/output/ChrsSoftMask/alignments/SRR11184196_round3_SJ.out.tab: No such file or directory /softwares/FINDER/Finder/scripts/fixOverlappingAndMergedTranscripts.py:350: VisibleDeprecationWarning: Creating an ndarray from ragged nested sequences (which is a list-or-tuple of lists-or-tuples-or ndarrays with different lengths or shapes) is deprecated. If you meant to do this, you must specify 'dtype=object' when creating the ndarray. coverage_info[transcript_id]["bed_cov"] = np.array( temp ) Warning: couldn't find fasta record for 'ENA_OX243811_OX243811'! Error: no genomic sequence available (check -g option!). /softwares/FINDER/Finder/scripts/fixOverlappingAndMergedTranscripts.py:657: RuntimeWarning: divide by zero encountered in double_scalars ratio3 = round( np.average( coverage_3rd_portion ) / np.average( coverage_2nd_portion ), 2 ) /softwares/FINDER/Finder/scripts/fixOverlappingAndMergedTranscripts.py:656: RuntimeWarning: invalid value encountered in double_scalars ratio2 = round( np.average( coverage_2nd_portion ) / np.average( coverage_3rd_portion ), 2 ) /softwares/FINDER/Finder/scripts/fixOverlappingAndMergedTranscripts.py:657: RuntimeWarning: invalid value encountered in double_scalars ratio3 = round( np.average( coverage_3rd_portion ) / np.average( coverage_2nd_portion ), 2 ) /softwares/FINDER/Finder/scripts/fixOverlappingAndMergedTranscripts.py:656: RuntimeWarning: divide by zero encountered in double_scalars ratio2 = round( np.average( coverage_2nd_portion ) / np.average( coverage_3rd_portion ), 2 ) /softwares/FINDER/Finder/scripts/fixOverlappingAndMergedTranscripts.py:655: RuntimeWarning: divide by zero encountered in double_scalars ratio1 = round( np.average( coverage_2nd_portion ) / np.average( coverage_1st_portion ), 2 ) multiprocessing.pool.RemoteTraceback: """ Traceback (most recent call last): File "/usr/lib/python3.8/multiprocessing/pool.py", line 125, in worker result = (True, func(*args, *kwds)) File "/usr/lib/python3.8/multiprocessing/pool.py", line 48, in mapstar return list(map(args)) File "/softwares/FINDER/Finder/scripts/removeRedundantTranscripts.py", line 22, in findSubsetTranscripts if transcripts_fasta[transcript_i] in transcripts_fasta[transcript_j]: KeyError: 'ENA_OX243811_OX243811.1_0_covsplit.0' """ The above exception was the direct cause of the following exception: Traceback (most recent call last): File "/softwares/FINDER/Finder/finder", line 688, in main() File "/softwares/FINDER/Finder/finder", line 649, in main orchestrateGeneModelPrediction( options, logger_proxy, logging_mutex ) File "/softwares/FINDER/Finder/finder", line 500, in orchestrateGeneModelPrediction removeRedundantTranscripts( input_gtf_filename, output_gtf_filename, options ) File "/softwares/FINDER/Finder/scripts/removeRedundantTranscripts.py", line 85, in removeRedundantTranscripts results = pool.map( findSubsetTranscripts, all_inputs ) File "/usr/lib/python3.8/multiprocessing/pool.py", line 364, in map return self._map_async(func, iterable, mapstar, chunksize).get() File "/usr/lib/python3.8/multiprocessing/pool.py", line 771, in get raise self._value KeyError: 'ENA_OX243811_OX243811.1_0_covsplit.0'

I haven't found a possible solution for this issue. Hope you can patch these.

On a side note, the issues people are experiencing regarding empty psiclass and no combined gff files could be due to people not splitting their paired-end reads from reads.fastq to reads_1.fastq + reads_2.fastq with SRA toolkit's fastq-dump.

sagnikbanerjee15 commented 1 year ago

Hello @Maxim-Karpov,

Thank you for your interest in finder. We are currently working on developing a different version of finder that will address most of the issues that are listed here. I do not anticipate making any further changes to the old version since the entire architecture of the new software will be hugely different.

Thanks for pointing out the issue with merged reads. We will definitely look into it.

Thank you.

Maxim-Karpov commented 1 year ago

Hello @Maxim-Karpov,

Thank you for your interest in finder. We are currently working on developing a different version of finder that will address most of the issues that are listed here. I do not anticipate making any further changes to the old version since the entire architecture of the new software will be hugely different.

Thanks for pointing out the issue with merged reads. We will definitely look into it.

Thank you.

Thanks for promptly getting back to me. When do you expect to release the new version of finder?