nanopore-wgs-consortium / NA12878

Data and analysis for NA12878 genome on nanopore
Other
372 stars 93 forks source link

Nucleotide symbol "U" rather than "T" found in cDNA #96

Closed zhenLiuXplr closed 3 years ago

zhenLiuXplr commented 3 years ago

Hi

I find a lot of "U" symbol in NA12878-cDNA_All_Guppy_3.2.6.fastq.gz.

$gunzip -c NA12878-cDNA_All_Guppy_3.2.6.fastq.gz | head -n 4
@05f796a6-4c57-41ad-aaf0-430fb591afb0 runid=0a7d4202ffd1076449fbdc3d8ba3d333c4117369
sampleid=Bham_Run1_20171115_cDNA read=58 ch=144 start_time=2017-11-16T17:59:32Z
UAGUAGGAAAAGAAAAAAAAGGAACUUUACGAGUGAGGAUGUAAAAUAAAGGUCACGAUUGGCGUCGUAUACUAAAAGUUUCUGACAUAUGUACAAAAGCAAUAGUAGCUGUCCCGGGCAACAACCAUCGUGAUGUUGUACUCGUAAUAAUUUUGUAGAUACUGGUUAUGUUACUUUAAGGUCUUCGGUAAGUGUGCACUGAUAUGGUUUAUAUUUUAACGAUUUAUUGUGUGUUCGUUUUGGUUGGGUGUCUCUGCUUUGUCAAAUACGACGAACUUUAAGGUCGUGUUCAGGAUUGUAGAGAUAAGUUCUAGAGUUACAUGAAGUUUUUCGUCGAUUAUAGGUAUUAUCAUUAUAAACACUCUACACUUUAGACCUAACUAUCGGAGUUUUAUAGUAGGGUUUUAUAUGCGGUUUAAUUUUCUAAUAAGAUUUACGAAAAGUGUAGAUGUAGUUCUUUGAUGUUUCUCGACGUUGUAAUAUCCUUUUAUCCAUAUUUGUAAACAAAGUUUCAAUGAAUGAGCACCAAUUGAAUACAAAGUAAACCUCAGUUAAUGAACGUCGGUCUUGUCAGUUGUAGAGGGAGACUACCAACUCCACAAUCACAAAGGUAGAAAGAUUCCGUUUGUGACGUAGGCUAAUAAGGUCUACUGACUUCUCUCUCUUAAGGUGUGGAGAAAAGUGUUUGCCUAACAAGCACCAGUGUAAAAUUACGAACCAGCUUGUCUUUGGACUACUAUUCUGAACAUUUCAAUUCUAGGUUUCCGUCUUCUUUUUGUAGAGCGGUCAUUUGCAACUCUGUCACUCCGCGCGUCGGACGACCAGGUAAGCUUCGGCAUCGUCAGUAGUACUUGCUCGAUAACACAUCUAUAUUGCCAUUUUAUAUUUCUUCUUCUUUUUCUUCUAUCUCGCUGUCCACGCUUGACUUCAGUUUGUUAGUAUUUGAGGUCCGCCCAGACUUAGGUUUCACAGUGAC
+
$$)+*%%(*)+$)*+/--+&01'#%&%$$$'$%'$)(''&$%$',-*--(+'%(($##&%)&"#)%$"%##$'$''($%%%*.&&'##$$#&%%(-(('$%%%()'%&+/../00'&$"%$$%$&%$$"$*&&$$$##$(&$"$%-*00/6:;:++0%('&0'''))$$$''$#($#%%&%&')'&%&(1'''%($$&4&')#'('('&%%%)*++(((##$&%$#%$$&')&%$$')1+)$$/0/*&$$%#'.03('%%$3&%&%%&&($##%$$&&&%$%%&$$&%'%%.*)&(($,&&%%'%+0.+*/477,')*&%%&#&'$(000/%*0-&%)(/-7'$%&$#%%#$$$$#&'%%$+)7+,*&$()&%&'.%%+,4(/.',$$&.244((,.,,)&%-.10%%).*%''010/.112283557;71<2<772+/8),+&+.,&(%#%(+*)*2;1--'&7553*%#&$$06./.-/.,&#&&('&$'(&'%++$#)+(%&$$$%&%$#&%&&%#%$&&&*.-&%,*+-+'(&&&'%%&*&4'1-+,)&&#&%'$%'*))%$#$##'$#%$$$%&%'&&$&(&%02.,'&0,-,%&$$%&'%$'%%$&)&&('(%##'&)0..)+%$#$*$%%+*'$#*$&&00(),,1(454$+6.*330//-$$$&'('&%---/'-#-45%*''37.2'&$$%%%#$#%&/+**,'')&$$"$$#))$&$+%%$&"%%##%'$&,,&%$$#%&%$((&&%'%&$$$###$%%%$%-.-,+&''*$%#%%&+*-562/&&,('*$,$#+&#$$$%#$$%&&&&).02931+&$%$%$"$##$$#$),$%%')').++%$&##'%&&'%$$%0($$%)((10$(4.-$$'-)'$(%*+4695444/6<8985.*&(243..0)%$++.$##'%$)).,+%&#$&'%%%$$$%$$'$#$$$$'#*'##$#(&$$%$%#$&$#

The appearance of "U" should be sensible in directRNA fastq file but not cDNA. It was observed in basecalls result from (Guppy 3.2.6) [Update January 2020]

mitenjain commented 3 years ago

Hello,

We just updated the data and fixed the issues in rel2. The new basecalls are with Guppy 4.2.2. Hope this helps.

Let us know if you have any questions.

zhenLiuXplr commented 3 years ago

Thank you mitenjain. I am going to download the update files. By the way, can you upload some MD5 check files? I was suffering from network congestion, so not pretty sure whether the downloaded file is complete.