Closed RaverJay closed 2 years ago
Awesome thx for the quick fix! I also can't test right now - but latest tomorrow evening ;)
On Sat, 18 Dec 2021, 19:05 Sebastian Krautwurst, @.***> wrote:
Quick fix for #184 https://github.com/replikation/poreCov/issues/184
I assume the pos is now the start of the amino acid before the insertion (e.g. R for R214REPE instead of E) Should give correct amino acid and codon number now
Please test, I can't right now =)
You can view, comment on, or merge this pull request online at:
https://github.com/replikation/poreCov/pull/185 Commit Summary
- 2642c8b https://github.com/replikation/poreCov/pull/185/commits/2642c8b0d0e48b1f1522e241d0864f7d03f73913 netclade pos is now the nucleotide before the insertion
File Changes
(1 file https://github.com/replikation/poreCov/pull/185/files)
- M bin/convert_insertions_nt2aa.py https://github.com/replikation/poreCov/pull/185/files#diff-b0f2b8e8d0fcc60bdb6af8231b6b5711bd381884370974025247e8b9bf1490ff (4)
Patch Links:
- https://github.com/replikation/poreCov/pull/185.patch
- https://github.com/replikation/poreCov/pull/185.diff
— Reply to this email directly, view it on GitHub https://github.com/replikation/poreCov/pull/185, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADN2CZ6EKRY3WA3QKTEL2L3URTEPXANCNFSM5KK4IHNA . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.
You are receiving this because you are subscribed to this thread.Message ID: @.***>
Hm @RaverJay now for the sequences that have insertions nothing from Nexctlade is reported : )
Argh, what o_o
Can you paste relevant lines from the files from the conversion process? (Original nextclade results and the converted nextclade results)
I suspect it's some small error but otherwise I can look at it tomorrow
Martin Hölzer @.***> schrieb am So., 19. Dez. 2021, 15:47:
Hm @RaverJay https://github.com/RaverJay now for the sequences that have insertions nothing from Nexctlade is reported : )
[image: image] https://user-images.githubusercontent.com/14393703/146679191-48a59dee-377c-455c-bea6-bce7e941da9d.png
— Reply to this email directly, view it on GitHub https://github.com/replikation/poreCov/pull/185#issuecomment-997404654, or unsubscribe https://github.com/notifications/unsubscribe-auth/AHCB7ENCPZHMVHGRAJFWPTLURXV7FANCNFSM5KK4IHNA . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.
You are receiving this because you were mentioned.Message ID: @.***>
Ah the two relevant processes (where insertions are in the nextclade output) actually failed
[43/ce94f0] process > determine_mutations_wf:add_aainsertions (23) [100%] 23 of 23, failed: 2 ✔
From the .command.log
cat raver-work/c1/b95e0d5839bbbf7fe5e84222daf4b7/.command.log
LOG: Started convert_insertions_nt2aa.py ...
Traceback (most recent call last):
File "raver-porecov/bin/convert_insertions_nt2aa.py", line 583, in <module>
res_data.at[sample, 'aaInsertionsCustom'] = insertions_nt_to_aa(nt_ins) if type(nt_ins) == str else ''
File "raver-porecov/bin/convert_insertions_nt2aa.py", line 564, in insertions_nt_to_aa
aa_ins_list.append(gene + ':' + aa_before + str(codon) + aa_before + aminos)
TypeError: can only concatenate str (not "NoneType") to str
The nextclade output and input for your conversion script is:
seqName clade qc.overallScore qc.overallStatus totalSubstitutions totalDeletions totalInsertions totalFrameShifts totalAminoacidSubstitutions totalAminoacidDeletions totalMissing totalNonACGTNs totalPcrPrimerChanges substitutions deletions insertions frameShifts aaSubstitutions aaDeletions missing nonACGTNs pcrPrimerChanges alignmentScore alignmentStart alignmentEnd qc.missingData.missingDataThreshold qc.missingData.score qc.missingData.status qc.missingData.totalMissing qc.mixedSites.mixedSitesThreshold qc.mixedSites.score qc.mixedSites.status qc.mixedSites.totalMixedSites qc.privateMutations.cutoff qc.privateMutations.excess qc.privateMutations.score qc.privateMutations.status qc.privateMutations.total qc.snpClusters.clusteredSNPs qc.snpClusters.score qc.snpClusters.status qc.snpClusters.totalSNPs qc.frameShifts.frameShifts qc.frameShifts.totalFrameShifts qc.frameShifts.frameShiftsIgnored qc.frameShifts.totalFrameShiftsIgnored qc.frameShifts.score qc.frameShifts.status qc.stopCodons.stopCodons qc.stopCodons.totalStopCodons qc.stopCodons.score qc.stopCodons.status errors
Samplename "21K (Omicron)" 10.145405 good 49 39 9 0 41 16 1160 0 5 C241T,A2832G,C3037T,T5386G,C5730T,G8393A,C10029T,C10449A,A11537G,T13195C,C14408T,C15240T,A18163G,C21762T,C21846T,G22578A,T22673C,C22674T,T22679C,C22686T,G22992A,C22995A,A23013C,A23040G,G23048A,A23055G,A23063T,T23075C,C23202A,A23403G,C23525T,T23599G,C23604A,G23948T,C24130A,A24424T,T24469A,C24503T,C25000T,C25584T,C26270T,G26709A,A27259C,C27807T,A28271T,C28311T,G28881A,G28882A,G28883C 6513-6515,11285-11293,21765-21770,21987-21995,22194-22196,28362-28370 22204:GAGCCAGAA E:T9I,M:A63T,N:P13L,N:R203K,N:G204R,ORF1a:K856R,ORF1a:T1822I,ORF1a:L2084I,ORF1a:A2710T,ORF1a:T3255I,ORF1a:P3395H,ORF1a:I3758V,ORF1b:P314L,ORF1b:I1566V,ORF9b:P10S,S:A67V,S:T95I,S:Y145D,S:L212I,S:G339D,S:S371L,S:S373P,S:S375F,S:S477N,S:T478K,S:E484A,S:Q493R,S:G496S,S:Q498R,S:N501Y,S:Y505H,S:T547K,S:D614G,S:H655Y,S:N679K,S:P681H,S:D796Y,S:N856K,S:Q954H,S:N969K,S:L981F N:E31-,N:R32-,N:S33-,ORF1a:S2083-,ORF1a:L3674-,ORF1a:S3675-,ORF1a:G3676-,ORF9b:E27-,ORF9b:N28-,ORF9b:A29-,S:H69-,S:V70-,S:G142-,S:V143-,S:Y144-,S:N211- 1-50,22786-22974,23612-23876,26299,26339-26694,26941,26943,26957-27177,29828-29903 Charité_E_F:C26270T,ChinaCDC_N_F:G28881A;G28882A;G28883C,USCDC_N1_P:C28311T 89343 0 29903 3000.000000 31.851852 mediocre 1160 10.000000 0.000000 good 0 24.000000 -8.000000 0.000000 good 0.000000 0.000000 good 0 0 0 0.000000 good 0 0.000000 good
Thats the called insertion on nt lvl:
awk 'BEGIN{FS="\t"};{print $16}' sample_clade.tsv insertions 22204:GAGCCAGAA
which corresponds to
EPE
so this should be fine.
Question is, what's on pos 22204 (should be R to be the correct insertion)
Okay so it seems Nextclade changed the positionso that it is now the last nucleotide of the preceding amino acid (R) instead of the first nucleotide of the first inserted aa (E), so it was 22205 before, and is 22204 now
before: '22205:GAGCCAGAA' -> 'S:R214REPE' new: '22204:GAGCCAGAA' -> 'S:R214REPE'
changed the indexing accordingly, please test
uff
well done!
I will merge this, thanks a lot for the quick fix @RaverJay !
I will also close the issue but open another one for SNPeff... this should be way more stable then using Nextclade where we even auto update the container an thus might not see such changes that fuch up the conversion
Quick fix for #184
I assume the pos is now the start of the amino acid before the insertion (e.g. R for R214REPE instead of E) Should give correct amino acid and codon number now
Please test, I can't right now =)