rrwick / Porechop

adapter trimmer for Oxford Nanopore reads
GNU General Public License v3.0
323 stars 124 forks source link

Install problem? #14

Closed fifthguy closed 7 years ago

fifthguy commented 7 years ago

Hi, right after installing I try to run (the "porechop -h" works) on sample data i get the following error:

porechop -i ./unclass.fastq -o ./b.fastq Traceback (most recent call last): File "/usr/local/bin/porechop", line 11, in load_entry_point('porechop==0.2.1', 'console_scripts', 'porechop')() File "/usr/local/lib/python3.4/dist-packages/porechop/porechop.py", line 33, in main reads, read_type = load_reads(args.input, args.verbosity, args.print_dest) File "/usr/local/lib/python3.4/dist-packages/porechop/porechop.py", line 196, in load_reads reads, read_type = load_fasta_or_fastq(input_filename) File "/usr/local/lib/python3.4/dist-packages/porechop/misc.py", line 125, in load_fasta_or_fastq return load_fastq(filename), 'FASTQ' File "/usr/local/lib/python3.4/dist-packages/porechop/misc.py", line 168, in load_fastq short_name = full_name.split()[0] IndexError: list index out of range

Any ideas on how to fix it?

(Installed via python3 setup.py ... method; source from github)

Unittests passed, also when porechop -i ... -o ... is run on the test set it runs without problem.

The sample that issues the error above was basecalled with albacore with basecalling already on (but with a high number of unclassified reads).

Thanks in advance!

khyox commented 7 years ago

It seems there is something strange with the names of your reads in the FASTQ file, like a line with a single character. Maybe it helps if you please copy&paste the output of: awk 'NR % 4 == 1' unclass.fastq | head

fifthguy commented 7 years ago

Hi, thanks for responding so soon:

here is the output (awk 'NR % 4 == 1' unclass.fastq | head):

@051a46b6-6180-4179-b1bc-31a06101ed88 runid=63d14249763dc8c2b6c05a50dc57ea68bd5af1b1 read=176 ch=245 start_time=2017-05-30T10:25:34Z barcode=unclassified

$%#$#(.2+@FEC=FDE52-'1;4=;9BC;EI737JJFIK=>:;?8898?==D<C17575.227F7@=?7C6//,,$338((3/.713HJ1@EKC@EHFC6//17)0+(---4C2996(3(5:-(')/-*-9.--7.(7226@30,45B>4@<21--'-7491C.2@1752854*116./'%%(,A57+&''($'+)+&&,-89124H9'-=6D5<+.8C:2(1(:.355C9:8A7)/4259218?/.-())'(4F49DF57&2+5CI9:D44,&2*(--.2>B,<>=@EC;04:8//*(',+477(1328?DIH5EJC/1246=4ED25>8594>9452FJJG/-=2;>/+--.+9+:/669:=:9=;?6DIB=./>244;F633.8CA9F0,'%)*/,*'3853)+/,--+339D@C7;98CFHIHCGGH540%-..@D,40215,51007165B5-&%$##&+327:G?1;7-&((-$% + TGGAAGAAGAAAAGAAAAGAAAGAAGAAGAAAGAAGAAAGGAAGAAGAAGGAAGACGAAAGGACGGTGCGAAGAAAGAAGAAAAAGATTTTGGTAGAAAAAAAGAAAAGCAGAAAAATTGCGCGCAGCTGCGCCGCGTGGCGGGCAAGGCGCCGTGACGCGAAGGATAACGCTGCTCGAACGTACATTTGACGTGCGTACACCACACGCCTCACCTCGATGGCGTGTTCGGGCGCTGGCGTTATCGGACGCTGGTTCGCGAGCAACGCGACGGACGTATATGCCTATCAGACAAGCGCGTTTTATCGTGCGGCGCCGATCTGTTGTTTGCTATCGGTGCGCGCGGCCTTCGCATGGTTTCGTTGGTGCCGTGGTTATGCTCTTCGCGCGGCCCGCTGCGTTGTGGTTTCGCGCGCAATATGTTCGCGCTGCCGTGACGTATTGTTTGTGCCTTCCATATACCTGGTCGTACCCGCGTGGACAAGCGTTGCGCCTCTTGTAAACCGGTACGCTAACATCGTTGAACGGCTGCCGGCCTTTGCGGTATCCGTGCGCAGCAAGCGTCTTTTACTTCCGTTGCGTGGTGGCGACGTTACGCTCATTTGAATTCGGTCAGAGCACCCGGTTGTGAAATCTGAAATGGGCACTAGTATGAGGCTGATGGTGAAAGCCAAATATACTGAATGAGGCAGAGAGCTGGTAGTAGCCTTTGAGGCATGACTCATCGGAGATAGAAACTCTCTTGGTACCAATGATTTCATTTCTTTGTCCATTAATTAATTGAATCATTGAGCACAATGGTAGAAAGTGATATTCTAGTAAATCAGTATCTGAAGGCTGATAAAGCTCGTCTCCGCAGTTATTCTTCATTCACCCAATTCAGAAAGCCCCTAATTTCTCAGGAAATCTGTTGGCACATTTAGAGCTATAATTGAAATATCTCAGAAGAGCTAAGAAATTTAGTGAGCTGAACGTTTTATTCATCTCGCAGAGGCCAGGCTACTGAGGAGTCATTAGTTTTAGTTTCTTTCCTCCATTCTAGCGAGTAAGAGTTTTGAAATCATGAGTGTGAAATATCTAATAATTTGAGGCTAGAGAAAACCTCAGAAACTGCTGTGGTGACGCCTTCATAAGCGCCATGACTATTTTCTTTCAGTGGTTTGAAACGCAAAACTTTGTGAGTAGGGATATTATCGGTACGCTGTTTTCGTGAGAAAGCCTCTCGGTAAAGCCTGGTAGAAACTCCTGACTCACAGCATGGAACATCTCCCGCGATGTGCCCTCTTTAAATGAGGCAGTTTGAAATATGTTTCATCTCTAGTGCAATTAAATCGGCCAGAGCCAATAGAAAGAAACCAGGGCGAAGCCAGAAACAAATTTATGAGAAGCTGACTGGCTGAGGCGGCGTGTCACTTGATCAAACTCTTTCTCATGATTTCGAGGCGCAATCTCAATCCCAGGCTGAAGTATGAAGAATCAATGGTAGAGAATGTCTTCAGTGAAGCTGAAAACAATCCAATTTCTTTTCAGAGTAGTTCATCTCTGGAGATGAGCCTCTTAGACAATGAACAGAGCACTGTTGTCCATTCTGGAATGGTTGGTGGAGGCTCGTGAGTGTAGAAATCAAACTTATGAAAAGGATTGTGTCACTGGCTCTAAGC @f8b4e75f-5f3f-4758-ae82-b13cc91326bf runid=63d14249763dc8c2b6c05a50dc57ea68bd5af1b1 read=48 ch=223 start_time=2017-05-30T10:22:47Z barcode=unclassified

$$%''++'-'&'%,.,+45C/.'(-)/33=5B9F7>3130<-+)(()03+)-+7>%$-*+4&.+&.(''&3./0/0)&$&&)3/((,&&)$$%%%(%#'#(/%'+++'+,6+%16'$'+()5G:+()(&)/.&.(%&01-+/2/)-'$+'&/.(+')'&'$+530;.)()&)(&'(%(),.'%'--./5444456+3-,/0.,&'094,'(((33679:2.+)(((/2'00-)'''&,-,1/,+')+8EB)(,'027&)$'%&'+46??4555:775320,),,.&51$%#''(7/')(2-(&&%&((.(('+()&(+')%''&$.4/'+0(--/.5+)('%-.,&(&%%,,+,13&(%(',,+101'%&%&/)%&&&&'%('(',(5,(,/+'1,&,)&'+)12(($)10/(+)','%%('&&),)'%$-4@>+(/(0)%('&)($)((..-'*-5.'++%.,/)'+%(%+),',)+))*)(,-3=1&-'%&&,-+.-/8><@+-)&'+-:-1,01-0=52652,5FE/+-+&% + CGGGTGTGGCCTGGTTCAGGTTACGTTAGGGTTAGCCTGGGACACCGACAAACCTCTTCAATATCACGCGGACATCACGCATGCCTGAATCTGGACTACACGGACACGTGGCAGCCATGCAGTGAACATCACGCACCTGGAACATGAACACCTGGGGCCTGGATTGCCCGTAATGCACTGGACATCACACATGCCTGGTCGGACGCAAAATCTGGACGCCTGCTGTCTCGTCTGGACACACGGGCGCGTGTAATGCCATGCGTGGTATCACCTGAATATGAACACCCCACGCACCTGGGGCCTGGATTGCCGTAATGCACGGGCCTCACACATACCTGATGCGGACGCAAAATGCAGACACACACCGGACGTAATGCGCCGCACTGGACATCACATTATAGTACAGACACACACACACGCGGGCACGTGTGCCCATGAACCTGGGGCGTACACATACACAGCCACCAGACTACACGTTCACGGGCTATCATGCGCAAACTGAACACACGTTCTGGTTGCTCTTATGAGCCTGGGCTTACGTACGGGCGTGTGCCCATGTGCTGGTATCACGCACGCAAACCGGACACCGCACGGACATCACACGTACTGGGTGGACGCACACCGTCGGGCACGTGTGCCCGTGTGCCTGGTCATGCAAACCTGGGTTCGCGCACACCTGGACGCAGACTCATGCCTGGTGGGCACACGGACTGTGTAACCATGCGTCAGGCCTCTGCCTGAATATGGGCTACCTGGGCCTGGTGCCATGTCGCACCTGGGCCTATCACACATGCACAAATCTGGGCCGCAAAATGGGACTTCCCCCCTGGACATGTCTGCCAGCAATGCTGGGCCTCTATGCCAAACCGGGCGCACACACACCCGGACGTGCCCCATCAACCTGAGATACCCGCATGCAGCCTGGGCACCACCACGTTCTGGACATCATACCTGAGCCTGGGCCTGCCTGCCGGGCACGTATTACTCTACAAAACAAGCACCCGCGGGCCTCTTACACCTGGGCCTGCCACACCCACAGTCCACGGGCGCACACACGATTCACCGGGCCTCCACACAAAACTGGGCGCACCTGCACATACCACGGATTTTGCAGGTAACCATGGAACACCTGATACACCCCACCACGTTCACAGACGCACCCGCATTCACAGGGGACTCATACACAAGACCTGGACGCACACACACACGCGGGTTGCTGGAGGAGTTGATCAGGTGTCTGATGAACAACAACAATGCGTGAGCCCC

Otherwise result of

head unclass.fastq is:

@051a46b6-6180-4179-b1bc-31a06101ed88 runid=63d14249763dc8c2b6c05a50dc57ea68bd5af1b1 read=176 ch=245 start_time=2017-05-30T10:25:34Z barcode=unclassified GTGTGCTTCGTTTGGTTACAAAGACATTTCTTCTTCTTTTCTGAAGAGGTAGCATTACTGAACACTTGCTCTAGGCAATTAGGCTGGATTCATCTCAGTTTGCACCTATTGAATTTCCATAGCCACCTTGGTCTGATTTCTAATATGGCAGAGGACAAAGCTAAGACGGATGGTGTGACTTGGATCTCTTGGTTAATGGTATTGGATTGGAATTTGTTTTGAATTTTTCCCGGAAACAGAAAATAGATGAATATCATAAATGCTTGAATTTTAAAAGACAGACTGCTATAGCTCATCCCTCTTCCCCTAACCTTTTCCCTGTGAGATTTCAGAATGGATAATATGGATAAAGGCACTTGATGTCAGGTGTACTATAATAAAATATGTGTCTTTCATTGAGGACCATTTAGATGTTGCTATAACTTTTCAGCGTAGAAAAAGGTATTTCCTAAACATTTTGGTTCTTTGTGGATAATTAGTGTGTTGTCATAGAATGACGCCACTTTGAAAGGCAGACAAAAATTTAGAAAAATTCAGATGTCTAAGTCTTTCTCACTAACCTGAAAACAGGTGCTGAAGAAAGAGACATCGATTATTATATGCAACC + $&')','+%,+1)+%(($#+1+%(('(15/24,'&(((%)()/1)&+&$+0:6=?::2-65+255===7-.5;-,'--+7;;64%,($%')/,(=6H6/8;;G55)7**46IH7)(+-+12)12742'-)-(+;G>;1:7556842),+3>6C>I0027<.1?=-/-&'**+4.*+))*>9=E?=:B36-044A5;.%(,,/*20)+3-8D7+'+.*(),-GII083),(0,%&**.75D7&&,.,(,+//43;3>34/+.)(*,3))%>A?..0)&/,:90&-5?9,,*(-+2A73615335602*<7=C<.?:+0*1-1-03JA>=4/>BDF@-<F101076887.,2-'),(,.3=??2,+/+0(7;9<J7DE>.>>724058BH>,+-/-2289BFGGKI(8:2(-:/-)),/.%(2..02*0)+9EKJ62,445G0;;;-,04791,2295?9+6*).,.2>11-2)/(-1.5=>DB69@80,+,',,-1???4;/0-8**6*1H;68B;)'0CC@6<I<;;86.-()2.39765+,).5?IJB?BEHIE@=BC==<<<B+-3')(&&,&$/%)('+.,'(#

@2c9e3944-f007-4424-b3ef-fe3ba5bd7250 runid=63d14249763dc8c2b6c05a50dc57ea68bd5af1b1 read=63 ch=342 start_time=2017-05-30T10:23:47Z barcode=unclassified TTCAACTTTCTTCAGCACCTCAGGGGGACTGTTTGACCACCACGCCAGGCTATGACAGGAGAACTTCTTAAATGCACGCACGAGGAGGTTTTGTTTGCATCCTCTCCAAGTTACCTAAAATTAAAACTGTCAACAACTTCAGAGACAAACTTCTTGCAGCGCTGGCTCCTAAGCAATTGAGTTCTGAACTCTGGCTTAACAGAGCTGTGGGTTTAGGAAGTAGTTTTGGATTCATCACTTCAGGCTCAGCCAGAGAAACGGAGATGCGTATGTTATAATCTCTTCTTCTAAACTTCCTCCTGGGTTACAGATCTCCTCTCTTCATTTGCATTATTAAGATCTCTCCTCATTTTTGAATCCCTCTCACCAACTGACTGTGGTTTTTAGGAAGATCTCTCCCTTCTCATGACTTACCTCCATCCTCTGTGTTCCCAAGTGCTGAGATGGGAGTGCTGAAAGGATGTCCGGTGTCTTTGTGTATAGCAATACGTAGCTTCT + $%#$#(.2+@FEC=FDE52-'1;4=;9BC;EI737JJFIK=>:;?8898?==D<C17575.227F7@=?7C6//,,$338((3/.713HJ1@EKC@EHFC6//17)0+(---4C2996(3(5:-(')/-*-9.--7.(7226@30,45B>4@<21--'-7491C.2@1752854*116./'%%(,A57+&''($'+)+&&,-89124H9'-=6D5<+.8C:2(1(:.355C9:8A7)/4259218?/.-())'(4F49DF57&2+5CI9:D44,&2*(--.2>B,<>=@EC;04:8//*(',+477(1328?DIH5EJC/1246=4ED25>8594>9452FJJG/-=2;>/+--.+9+:/669:=:9=;?6DIB=./>244;F633.8CA9F0,'%)*/,*'3853)+/,--+339D@C7;98CFHIHCGGH540%-..@D,40215,51007165B5-&%$##&+327:G?1;7-&((-$%

khyox commented 7 years ago

There's something weird with that output. Can you please try the next? head -40 unclass.fastq | awk 'NR % 4 == 1'

As in the first command I sent you (with the head after the awk), I would just expect 10 lines with the name of the 10 first reads. Something like:

@f787262b-8b5a-40f8-9061-cc51cea8e9da runid=1b7dfb71321c465f1b7e32ebc7d673941eb04143 read=117 ...
@ae0e55e4-dc61-46eb-b134-e6f462c581f3 runid=1b7dfb71321c465f1b7e32ebc7d673941eb04143 read=12 ...
@cc6bb3af-8184-4353-8ecd-04c9903d9820 runid=1b7dfb71321c465f1b7e32ebc7d673941eb04143 read=140 ...
@fc1933be-c6af-4f03-9711-3b62c74fe5e3 runid=1b7dfb71321c465f1b7e32ebc7d673941eb04143 read=140 ...
@7918d120-cb58-49a1-bb80-55ec548d5f8c runid=1b7dfb71321c465f1b7e32ebc7d673941eb04143 read=143 ...
@0d84b960-d584-4178-bf18-580fe4a1a39e runid=1b7dfb71321c465f1b7e32ebc7d673941eb04143 read=14 ...
@575ac590-11ab-436f-a7ca-ee6a863a4bed runid=1b7dfb71321c465f1b7e32ebc7d673941eb04143 read=14 ...
@0fe4971f-85e7-4132-97a0-cded34d43112 runid=1b7dfb71321c465f1b7e32ebc7d673941eb04143 read=153 ...
@00c84410-21d1-496c-9411-93d02cfad9cc runid=1b7dfb71321c465f1b7e32ebc7d673941eb04143 read=16 ...
@cc3db35b-2b3c-4f04-8cc0-4816e03a12cb runid=1b7dfb71321c465f1b7e32ebc7d673941eb04143 read=17 ...

... but you don't get it. It seems there is a problem with the line feeds in your file, I guess. Maybe there is a blank line at the beginning too. Sorry of not being of more help!

fifthguy commented 7 years ago

Hi,

I'll try it as soon as possible... Maybe an albacore issue... I'm trying out another fast5 to fastq converter (still trying using the albacore-processed fast5), to see if it gets any better... Anyhow, if it doesn't work and if the headers are indeed the only problem, it can be easily solved...

Just to know, which albacore version do you use normally for basecalling?

Thanks for everything and I'll let you know! Thanks also for sharing the code!

On 2 Jun 2017 18:24, "khyox" notifications@github.com wrote:

There's something weird with that output. Can you please try the next? head -40 unclass.fastq | awk 'NR % 4 == 1'

As in the first command I sent you (with the head after the awk), I would just expect 10 lines with the name of the 10 first reads. Something like:

@f787262b-8b5a-40f8-9061-cc51cea8e9da runid=1b7dfb71321c465f1b7e32ebc7d673941eb04143 read=117 ... @ae0e55e4-dc61-46eb-b134-e6f462c581f3 runid=1b7dfb71321c465f1b7e32ebc7d673941eb04143 read=12 ... @cc6bb3af-8184-4353-8ecd-04c9903d9820 runid=1b7dfb71321c465f1b7e32ebc7d673941eb04143 read=140 ... @fc1933be-c6af-4f03-9711-3b62c74fe5e3 runid=1b7dfb71321c465f1b7e32ebc7d673941eb04143 read=140 ... @7918d120-cb58-49a1-bb80-55ec548d5f8c runid=1b7dfb71321c465f1b7e32ebc7d673941eb04143 read=143 ... @0d84b960-d584-4178-bf18-580fe4a1a39e runid=1b7dfb71321c465f1b7e32ebc7d673941eb04143 read=14 ... @575ac590-11ab-436f-a7ca-ee6a863a4bed runid=1b7dfb71321c465f1b7e32ebc7d673941eb04143 read=14 ... @0fe4971f-85e7-4132-97a0-cded34d43112 runid=1b7dfb71321c465f1b7e32ebc7d673941eb04143 read=153 ... @00c84410-21d1-496c-9411-93d02cfad9cc runid=1b7dfb71321c465f1b7e32ebc7d673941eb04143 read=16 ... @cc3db35b-2b3c-4f04-8cc0-4816e03a12cb runid=1b7dfb71321c465f1b7e32ebc7d673941eb04143 read=17 ...

... but you don't get it. It seems there is a problem with the line feeds in your file, I guess. Maybe there is a blank line at the beginning too. Sorry of not being of more help!

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/rrwick/Porechop/issues/14#issuecomment-305837830, or mute the thread https://github.com/notifications/unsubscribe-auth/ADD7rRscctFKYJMQbTsosJdD4n0F6Po0ks5sADdLgaJpZM4NuUNi .

fifthguy commented 7 years ago

Hi!

Thanks for your help, it actually works now, just not with the original albacore outputs. Please find the full description of what's been going on below, also please let me know what you impression on it is (if you can spare the time. Anyhow, I hope the remarks below might help the next (if there is) person who might run into this problem.


I ran:

head -40 unclass.fastq | awk 'NR % 4 == 1'

and got:

@051a46b6-6180-4179-b1bc-31a06101ed88 runid=63d14249763dc8c2b6c05a50dc57ea68bd5af1b1 read=176 ch=245 start_time=2017-05-30T10:25:34Z barcode=unclassified

$%#$#(.2+@FEC=FDE52-'1;4=;9BC;EI737JJFIK=>:;?8898?==D<C17575.227F7@=?7C6//,,$338((3/.713HJ1@EKC@EHFC6//17)0+(---4C2996(3(5:-(')/-*-9.--7.(7226@30,45B>4@<21--'-7491C.2@1752854*116./'%%(,A57+&''($'+)+&&,-89124H9'-=6D5<+.8C:2(1(:.355C9:8A7)/4259218?/.-())'(4F49DF57&2+5CI9:D44,&2*(--.2>B,<>=@EC;04:8//*(',+477(1328?DIH5EJC/1246=4ED25>8594>9452FJJG/-=2;>/+--.+9+:/669:=:9=;?6DIB=./>244;F633.8CA9F0,'%)*/,*'3853)+/,--+339D@C7;98CFHIHCGGH540%-..@D,40215,51007165B5-&%$##&+327:G?1;7-&((-$% + TGGAAGAAGAAAAGAAAAGAAAGAAGAAGAAAGAAGAAAGGAAGAAGAAGGAAGACGAAAGGACGGTGCGAAGAAAGAAGAAAAAGATTTTGGTAGAAAAAAAGAAAAGCAGAAAAATTGCGCGCAGCTGCGCCGCGTGGCGGGCAAGGCGCCGTGACGCGAAGGATAACGCTGCTCGAACGTACATTTGACGTGCGTACACCACACGCCTCACCTCGATGGCGTGTTCGGGCGCTGGCGTTATCGGACGCTGGTTCGCGAGCAACGCGACGGACGTATATGCCTATCAGACAAGCGCGTTTTATCGTGCGGCGCCGATCTGTTGTTTGCTATCGGTGCGCGCGGCCTTCGCATGGTTTCGTTGGTGCCGTGGTTATGCTCTTCGCGCGGCCCGCTGCGTTGTGGTTTCGCGCGCAATATGTTCGCGCTGCCGTGACGTATTGTTTGTGCCTTCCATATACCTGGTCGTACCCGCGTGGACAAGCGTTGCGCCTCTTGTAAACCGGTACGCTAACATCGTTGAACGGCTGCCGGCCTTTGCGGTATCCGTGCGCAGCAAGCGTCTTTTACTTCCGTTGCGTGGTGGCGACGTTACGCTCATTTGAATTCGGTCAGAGCACCCGGTTGTGAAATCTGAAATGGGCACTAGTATGAGGCTGATGGTGAAAGCCAAATATACTGAATGAGGCAGAGAGCTGGTAGTAGCCTTTGAGGCATGACTCATCGGAGATAGAAACTCTCTTGGTACCAATGATTTCATTTCTTTGTCCATTAATTAATTGAATCATTGAGCACAATGGTAGAAAGTGATATTCTAGTAAATCAGTATCTGAAGGCTGATAAAGCTCGTCTCCGCAGTTATTCTTCATTCACCCAATTCAGAAAGCCCCTAATTTCTCAGGAAATCTGTTGGCACATTTAGAGCTATAATTGAAATATCTCAGAAGAGCTAAGAAATTTAGTGAGCTGAACGTTTTATTCATCTCGCAGAGGCCAGGCTACTGAGGAGTCATTAGTTTTAGTTTCTTTCCTCCATTCTAGCGAGTAAGAGTTTTGAAATCATGAGTGTGAAATATCTAATAATTTGAGGCTAGAGAAAACCTCAGAAACTGCTGTGGTGACGCCTTCATAAGCGCCATGACTATTTTCTTTCAGTGGTTTGAAACGCAAAACTTTGTGAGTAGGGATATTATCGGTACGCTGTTTTCGTGAGAAAGCCTCTCGGTAAAGCCTGGTAGAAACTCCTGACTCACAGCATGGAACATCTCCCGCGATGTGCCCTCTTTAAATGAGGCAGTTTGAAATATGTTTCATCTCTAGTGCAATTAAATCGGCCAGAGCCAATAGAAAGAAACCAGGGCGAAGCCAGAAACAAATTTATGAGAAGCTGACTGGCTGAGGCGGCGTGTCACTTGATCAAACTCTTTCTCATGATTTCGAGGCGCAATCTCAATCCCAGGCTGAAGTATGAAGAATCAATGGTAGAGAATGTCTTCAGTGAAGCTGAAAACAATCCAATTTCTTTTCAGAGTAGTTCATCTCTGGAGATGAGCCTCTTAGACAATGAACAGAGCACTGTTGTCCATTCTGGAATGGTTGGTGGAGGCTCGTGAGTGTAGAAATCAAACTTATGAAAAGGATTGTGTCACTGGCTCTAAGC @f8b4e75f-5f3f-4758-ae82-b13cc91326bf runid=63d14249763dc8c2b6c05a50dc57ea68bd5af1b1 read=48 ch=223 start_time=2017-05-30T10:22:47Z barcode=unclassified

$$%''++'-'&'%,.,+45C/.'(-)/33=5B9F7>3130<-+)(()03+)-+7>%$-*+4&.+&.(''&3./0/0)&$&&)3/((,&&)$$%%%(%#'#(/%'+++'+,6+%16'$'+()5G:+()(&)/.&.(%&01-+/2/)-'$+'&/.(+')'&'$+530;.)()&)(&'(%(),.'%'--./5444456+3-,/0.,&'094,'(((33679:2.+)(((/2'00-)'''&,-,1/,+')+8EB)(,'027&)$'%&'+46??4555:775320,),,.&51$%#''(7/')(2-(&&%&((.(('+()&(+')%''&$.4/'+0(--/.5+)('%-.,&(&%%,,+,13&(%(',,+101'%&%&/)%&&&&'%('(',(5,(,/+'1,&,)&'+)12(($)10/(+)','%%('&&),)'%$-4@>+(/(0)%('&)($)((..-'*-5.'++%.,/)'+%(%+),',)+))*)(,-3=1&-'%&&,-+.-/8><@+-)&'+-:-1,01-0=52652,5FE/+-+&% + CGGGTGTGGCCTGGTTCAGGTTACGTTAGGGTTAGCCTGGGACACCGACAAACCTCTTCAATATCACGCGGACATCACGCATGCCTGAATCTGGACTACACGGACACGTGGCAGCCATGCAGTGAACATCACGCACCTGGAACATGAACACCTGGGGCCTGGATTGCCCGTAATGCACTGGACATCACACATGCCTGGTCGGACGCAAAATCTGGACGCCTGCTGTCTCGTCTGGACACACGGGCGCGTGTAATGCCATGCGTGGTATCACCTGAATATGAACACCCCACGCACCTGGGGCCTGGATTGCCGTAATGCACGGGCCTCACACATACCTGATGCGGACGCAAAATGCAGACACACACCGGACGTAATGCGCCGCACTGGACATCACATTATAGTACAGACACACACACACGCGGGCACGTGTGCCCATGAACCTGGGGCGTACACATACACAGCCACCAGACTACACGTTCACGGGCTATCATGCGCAAACTGAACACACGTTCTGGTTGCTCTTATGAGCCTGGGCTTACGTACGGGCGTGTGCCCATGTGCTGGTATCACGCACGCAAACCGGACACCGCACGGACATCACACGTACTGGGTGGACGCACACCGTCGGGCACGTGTGCCCGTGTGCCTGGTCATGCAAACCTGGGTTCGCGCACACCTGGACGCAGACTCATGCCTGGTGGGCACACGGACTGTGTAACCATGCGTCAGGCCTCTGCCTGAATATGGGCTACCTGGGCCTGGTGCCATGTCGCACCTGGGCCTATCACACATGCACAAATCTGGGCCGCAAAATGGGACTTCCCCCCTGGACATGTCTGCCAGCAATGCTGGGCCTCTATGCCAAACCGGGCGCACACACACCCGGACGTGCCCCATCAACCTGAGATACCCGCATGCAGCCTGGGCACCACCACGTTCTGGACATCATACCTGAGCCTGGGCCTGCCTGCCGGGCACGTATTACTCTACAAAACAAGCACCCGCGGGCCTCTTACACCTGGGCCTGCCACACCCACAGTCCACGGGCGCACACACGATTCACCGGGCCTCCACACAAAACTGGGCGCACCTGCACATACCACGGATTTTGCAGGTAACCATGGAACACCTGATACACCCCACCACGTTCACAGACGCACCCGCATTCACAGGGGACTCATACACAAGACCTGGACGCACACACACACGCGGGTTGCTGGAGGAGTTGATCAGGTGTCTGATGAACAACAACAATGCGTGAGCCCC

Basically no different from before; then I checked the results of the 'nanopolish extract' tool and got what you anticipated:

@c3135cb5-8315-42b1-be40-be84473d8f45_Basecall_1D_template:1D_000:template Fito_Minion_W7_20170530_FNFAH02076_MN21448_sequencing_run_test4_6_8_MCV_2070530_57419_ch495_read21188_strand called/workspace/barcode02/22/Fito_Minion_W7_20170530_FNFAH02076_MN21448_sequencing_run_test4_6_8_MCV_2070530_57419_ch495_read21188_strand.fast5 @d2be6ac7-580f-484a-b20c-fdf5a3d82a5c_Basecall_1D_template:1D_000:template Fito_Minion_W7_20170530_FNFAH02076_MN21448_sequencing_run_test4_6_8_MCV_2070530_57419_ch450_read23938_strand called/workspace/barcode02/22/Fito_Minion_W7_20170530_FNFAH02076_MN21448_sequencing_run_test4_6_8_MCV_2070530_57419_ch450_read23938_strand.fast5 @7688692d-51ed-4680-b669-8780dcd59222_Basecall_1D_template:1D_000:template Fito_Minion_W7_20170530_FNFAH02076_MN21448_sequencing_run_test4_6_8_MCV_2070530_57419_ch287_read17099_strand called/workspace/barcode02/22/Fito_Minion_W7_20170530_FNFAH02076_MN21448_sequencing_run_test4_6_8_MCV_2070530_57419_ch287_read17099_strand.fast5 @7e7e9302-d245-4858-bdec-510d7ea85a8e_Basecall_1D_template:1D_000:template Fito_Minion_W7_20170530_FNFAH02076_MN21448_sequencing_run_test4_6_8_MCV_2070530_57419_ch240_read11811_strand called/workspace/barcode02/22/Fito_Minion_W7_20170530_FNFAH02076_MN21448_sequencing_run_test4_6_8_MCV_2070530_57419_ch240_read11811_strand.fast5 @3dadd016-c36b-4d17-9643-d4bb9e379fc2_Basecall_1D_template:1D_000:template Fito_Minion_W7_20170530_FNFAH02076_MN21448_sequencing_run_test4_6_8_MCV_2070530_57419_ch236_read12378_strand called/workspace/barcode02/22/Fito_Minion_W7_20170530_FNFAH02076_MN21448_sequencing_run_test4_6_8_MCV_2070530_57419_ch236_read12378_strand.fast5 @8c730d3a-990d-4738-a994-49cb4d20feaa_Basecall_1D_template:1D_000:template Fito_Minion_W7_20170530_FNFAH02076_MN21448_sequencing_run_test4_6_8_MCV_2070530_57419_ch226_read14178_strand called/workspace/barcode02/22/Fito_Minion_W7_20170530_FNFAH02076_MN21448_sequencing_run_test4_6_8_MCV_2070530_57419_ch226_read14178_strand.fast5 @cbc85dbe-9d13-4b86-a8bb-46009c27bc84_Basecall_1D_template:1D_000:template Fito_Minion_W7_20170530_FNFAH02076_MN21448_sequencing_run_test4_6_8_MCV_2070530_57419_ch123_read15081_strand called/workspace/barcode02/22/Fito_Minion_W7_20170530_FNFAH02076_MN21448_sequencing_run_test4_6_8_MCV_2070530_57419_ch123_read15081_strand.fast5 @b643c026-0c8e-45b0-970c-42b4190fc2b7_Basecall_1D_template:1D_000:template Fito_Minion_W7_20170530_FNFAH02076_MN21448_sequencing_run_test4_6_8_MCV_2070530_57419_ch369_read27295_strand called/workspace/barcode02/22/Fito_Minion_W7_20170530_FNFAH02076_MN21448_sequencing_run_test4_6_8_MCV_2070530_57419_ch369_read27295_strand.fast5 @4f29760a-5a20-4db1-a80f-83e41c1c73bf_Basecall_1D_template:1D_000:template Fito_Minion_W7_20170530_FNFAH02076_MN21448_sequencing_run_test4_6_8_MCV_2070530_57419_ch402_read3760_strand called/workspace/barcode02/22/Fito_Minion_W7_20170530_FNFAH02076_MN21448_sequencing_run_test4_6_8_MCV_2070530_57419_ch402_read3760_strand.fast5 @f3af6697-d342-402f-95b8-a342cad01a71_Basecall_1D_template:1D_000:template Fito_Minion_W7_20170530_FNFAH02076_MN21448_sequencing_run_test4_6_8_MCV_2070530_57419_ch297_read17900_strand called/workspace/barcode02/22/Fito_Minion_W7_20170530_FNFAH02076_MN21448_sequencing_run_test4_6_8_MCV_2070530_57419_ch297_read17900_strand.fast5

From the actual machine I got fast5 traces (they seem to be fully binary or hex -- or something like it) and basecalled with albacore 1.1.2 with the barcoding option on, converting to fast5 and fastq. Here I actually didn't notice that anything could have been wrong with the fasq files, and the outputted fast5 files seemed 'a bit less binary' then the originals. It took me a while to figre out that this "nanopolish extract" and so on don't actually do basecalling from the binary fast5 but from the readily basecalled fast5 (after albacore)...

In the end I ran the nanopolish command on the "semi"-binary fast5 files and got "functional" fastqs. Porechop is currently running fine, without the previous error. It seems that the re was something aberrant going on with the albacore during fastq-output... Interestingly however, I used "canu" to try and assemble the reads in fastq in the original albacore output and it still worked.


Thanks again! (Should I close this issue?)

khyox commented 7 years ago

Hi @fifthguy! Curious issue. Thank you very much for the detailed explanation about your different testings and final solution! I am sure this will help others if they find a similar problem. Perhaps @rrwick (as the author of this wonderful piece of code) wants to add something but I think it's perfectly ok to close the issue. 😃

rrwick commented 7 years ago

Hi all, sorry to come to this one so late. I too have noticed this problem: Albacore v1.1 puts extra line breaks between reads. This would break quite a lot of FASTQ parsers, including the naive one in Porechop. They seem to have fixed it in v1.2, though, so hopefully a thing of the past!

khyox commented 7 years ago

Thanks for the update Ryan! 😃