Open laurabaxter21 opened 1 year ago
Hello @laurabaxter21,
Thank you very much for your interest in finder
. We have decided to focus our attention on developing the 2nd version of the software. As of now, we do not have the capabilities to support the older version due to a lack of personnel and I sincerely apologize for that. If you want to follow up on this please email me at sagnikbanerjee15@gmail.com and I will do my best to help you out.
Thank you.
Running the latest run_finder-v1.1.0. Everything runs fine until the codan step (Braker is complete), which finds a duplicate key and kills the pipeline. Looking at the assemblies_psiclass_modified/combined/combined_split_transcripts_with_bad_SJ_redundancy_removed.fasta file for duplicated sequence IDs, I find 2 (C2.27447_0_covsplit.0 and C7.149167_0_covsplit.0, both with different sequences in each of the duplicates).
Could I just delete these out from FASTA/gtf and continue from checkpoint 5?
assemblies_psiclass_modified/combined/cds_predict.error:
Traceback (most recent call last): File "/softwares/CODAN/CodAn-1.2/bin/codan.py", line 524, in main() File "/softwares/CODAN/CodAn-1.2/bin/codan.py", line 506, in main _codanBOTH(options.transcripts, options.output_folder, options.model, options.cpu) File "/softwares/CODAN/CodAn-1.2/bin/codan.py", line 355, in _codanBOTH _retrieveORFBOTH(transcripts, outF+"minus.fa", outF) File "/softwares/CODAN/CodAn-1.2/bin/codan.py", line 147, in _retrieveORFBOTH record_dictP = SeqIO.index(transcripts, "fasta") File "/usr/lib/python3/dist-packages/Bio/SeqIO/init.py", line 979, in index return _IndexedSeqFileDict( File "/usr/lib/python3/dist-packages/Bio/File.py", line 350, in init raise ValueError("Duplicate key '%s'" % key) ValueError: Duplicate key 'C2.27447_0_covsplit.0'
I am having the same issue. Did you figure out a solution?
Hi, yes I recall I just deleted the offending duplicated sequences from the FASTA file and their corresponding entries from the gft file (they didn't seem critically important). Then I re-ran finder from checkpoint 5 and it completed OK.
Hope that helps, Laura
From: Gregory M. Chorak, PhD @.> Sent: 07 June 2023 16:03 To: sagnikbanerjee15/Finder @.> Cc: Baxter, Laura @.>; Mention @.> Subject: Re: [sagnikbanerjee15/Finder] codan fails and kills pipeline due to finding duplicate key(s) (Issue #76)
Running the latest run_finder-v1.1.0. Everything runs fine until the codan step (Braker is complete), which finds a duplicate key and kills the pipeline. Looking at the assemblies_psiclass_modified/combined/combined_split_transcripts_with_bad_SJ_redundancy_removed.fasta file for duplicated sequence IDs, I find 2 (C2.27447_0_covsplit.0 and C7.149167_0_covsplit.0, both with different sequences in each of the duplicates).
Could I just delete these out from FASTA/gtf and continue from checkpoint 5?
assemblies_psiclass_modified/combined/cds_predict.error:
Traceback (most recent call last): File "/softwares/CODAN/CodAn-1.2/bin/codan.py", line 524, in main() File "/softwares/CODAN/CodAn-1.2/bin/codan.py", line 506, in main codan_BOTH(options.transcripts, options.output_folder, options.model, options.cpu) File "/softwares/CODAN/CodAn-1.2/bin/codan.py", line 355, in codan_BOTH retrieveORF_BOTH(transcripts, outF+"minus.fa", outF) File "/softwares/CODAN/CodAn-1.2/bin/codan.py", line 147, in retrieveORF_BOTH record_dictP = SeqIO.index(transcripts, "fasta") File "/usr/lib/python3/dist-packages/Bio/SeqIO/init.py", line 979, in index return _IndexedSeqFileDict( File "/usr/lib/python3/dist-packages/Bio/File.py", line 350, in init raise ValueError("Duplicate key '%s'" % key) ValueError: Duplicate key 'C2.27447_0_covsplit.0'
I am having the same issue. Did you figure out a solution?
— Reply to this email directly, view it on GitHubhttps://github.com/sagnikbanerjee15/Finder/issues/76#issuecomment-1581011133, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AFLU2GXLSA533TUDYDT4HB3XKCJ2RANCNFSM6AAAAAAWDCCUWU. You are receiving this because you were mentioned.Message ID: @.***>
That worked for me also.
Thank you!
Greg
Running the latest run_finder-v1.1.0. Everything runs fine until the codan step (Braker is complete), which finds a duplicate key and kills the pipeline. Looking at the assemblies_psiclass_modified/combined/combined_split_transcripts_with_bad_SJ_redundancy_removed.fasta file for duplicated sequence IDs, I find 2 (C2.27447_0_covsplit.0 and C7.149167_0_covsplit.0, both with different sequences in each of the duplicates).
Could I just delete these out from FASTA/gtf and continue from checkpoint 5?
assemblies_psiclass_modified/combined/cds_predict.error:
Traceback (most recent call last): File "/softwares/CODAN/CodAn-1.2/bin/codan.py", line 524, in
main()
File "/softwares/CODAN/CodAn-1.2/bin/codan.py", line 506, in main
_codanBOTH(options.transcripts, options.output_folder, options.model, options.cpu)
File "/softwares/CODAN/CodAn-1.2/bin/codan.py", line 355, in _codanBOTH
_retrieveORFBOTH(transcripts, outF+"minus.fa", outF)
File "/softwares/CODAN/CodAn-1.2/bin/codan.py", line 147, in _retrieveORFBOTH
record_dictP = SeqIO.index(transcripts, "fasta")
File "/usr/lib/python3/dist-packages/Bio/SeqIO/init.py", line 979, in index
return _IndexedSeqFileDict(
File "/usr/lib/python3/dist-packages/Bio/File.py", line 350, in init
raise ValueError("Duplicate key '%s'" % key)
ValueError: Duplicate key 'C2.27447_0_covsplit.0'