jflucier / ILL_pipelines

Isabelle Laforest-Lapointe Laboratory code
0 stars 1 forks source link

assembly repair.sh semble produire un fichier vide #51

Closed jorondo1 closed 1 year ago

jorondo1 commented 1 year ago

je teste l'assemblage sur un autre jeu d'échantillons et j'ai une erreur étrange comme si un des deux fastq était vide après le reordering du script assembly (ici d'après moi).

Voir le log sur /nfs3_ib/ip29-ib/ip29/ilafores_group/jo/Plachokova2021/assembly/logs/assembly_bin_refinement-104195_1.slurm.out

jflucier commented 1 year ago

je check ca bientot

jflucier commented 1 year ago
|09:22:06|jflucier@ip29:[ILL_pipelines]> cd /nfs3_ib/ip29-ib/ip29/ilafores_group/jo/Plachokova2021/preproc/S_3527/
|09:24:45|jflucier@ip29:[S_3527]> ml bbmap/38.86
For improved speed, add 'usejni=t' to the command line of BBMap tools which support the use of the compiled jni C code.
|09:26:28|jflucier@ip29:[S_3527]> repair.sh \
> in=S_3527_paired_1.fastq \
> in2=S_3527_paired_2.fastq \
> out=tmp/S_3527_paired_sorted_1.fastq \
> out2=tmp/S_3527_paired_sorted_2.fastq
java -ea -Xmx91722m -cp /cvmfs/soft.computecanada.ca/easybuild/software/2020/Core/bbmap/38.86/current/ jgi.SplitPairsAndSingles rp in=S_3527_paired_1.fastq in2=S_3527_paired_2.fastq out=tmp/S_3527_paired_sorted_1.fastq out2=tmp/S_3527_paired_sorted_2.fastq
Picked up JAVA_TOOL_OPTIONS: -Xmx2g
Executing jgi.SplitPairsAndSingles [rp, in=S_3527_paired_1.fastq, in2=S_3527_paired_2.fastq, out=tmp/S_3527_paired_sorted_1.fastq, out2=tmp/S_3527_paired_sorted_2.fastq]

Set INTERLEAVED to false
Started output stream.

Input:                      8266106 reads       1117210396 bases.
Result:                     8266106 reads (100.00%)     1117210396 bases (100.00%)
Pairs:                      0 reads (0.00%)     0 bases (0.00%)
Singletons:                 8266106 reads (100.00%)     1117210396 bases (100.00%)

Time:                           28.672 seconds.
Reads Processed:       8266k    288.30k reads/sec
Bases Processed:       1117m    38.96m bases/sec

les reads ne semble pas pairer entre les 2 fastq. Si on regarde les headers des entree du fasta, ca semble etre le cas:

|09:32:06|jflucier@ip29:[S_3527]> head S_3527_paired_1.fastq
@ERR5383527.82.1
CGATACGCTAAGGACGCAACCCTGCGCGAGGAGGTCCTGCCAAAGCTGCGGGGCATCGCCGCCCACGCCCTGGGAATCTACCGGAACGGCCCCGACCTGGAGGCGGCCCGCTCCGCCGTGAAGGCCATCCTGGACGAGCCGGCGACGGCA
+
FFF:FFFFFFFFFFF:FFF:FFFFFFFF:FFFFFFFFFFFFFFFF:FFFFFFFFFFFFFFFFFFFFFF:FFFFFF,FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF,FFFFFFFFFFFFFFFFFFFFFF
@ERR5383527.83.1
CATTGACGAGATTTCCTCGTTCAAGTCTTATCAGGCAAAACGGTTTCGTGCACTTATGAAACTGCGTCCCGCAGTTAAACGCATCGTCGGATTGACGGGCACGCCATCTGCGAACGGTCTCATGGATCTCTGGGCAGAATTCCGGCTTCT
+
FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF:FFFFFFFFFFFFFFFFFFFFF,FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF,FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF
@ERR5383527.86.1
ATCCCCGTAAGCCGAATGAAATGGTACGCGGTGTGGTAACCTTGCCCCACGGTACGGGTAGAGTGGTACGGGTGTTGGCTCTGTGCCTGCCCGAGAAGGAGGAAGAGGCGCGTGCGGCGGGTGCAGACTACGTAGGACTCGACGAGTATG

vs

|09:32:17|jflucier@ip29:[S_3527]> head S_3527_paired_2.fastq
@ERR5383527.16.2
CGCACACCTACGACGAGAACGGGCGGCTGGCAACGACGACGGACGCGCTGGGGAACACGACGCAGTACCATTACTCGGCGGACGGCCGCCTCCTGTCGATG
+
FFFFFFFFFFFFFFFFFF,FFFFFFFFFFFFFFFFFFFFFFF:FFFFFFFFFFF,FFFFFFFFFFF:FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF:F
@ERR5383527.20.2
TACAATAGAAAACAAAATACTTCCCGTACCGGGATCAATATACAATAATGGAATCATAACAACACCCCTTTCCGCATTATTTACTATTAA
+
FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF:FFFFFF:FFFFFFFFFFFFFFFFFFFF:FFFFFF
@ERR5383527.23.2
ATGGCTGGCATGGGGCTGAACTTTGGCATGGTGCTCGGAGCCATGGCGGGCCAGATTGGGCTGATCCTCGTCCTGGACTGGGCGATCGTGGGCATCCCCGGTGTCGCGCTGTCCATGATGATCGGGACGCCCATCGCCGTGCTGCTGGGC

J'ai essaye de sorter d'une autre facon et les reads ne semble pas pairer encore quand on roule repair ensuite:

|09:45:56|jflucier@ip29:[S_3527]> cat S_3527_paired_1.fastq | paste - - - - | sort -k1,1 -t " " | tr "\t" "\n" > S_3527_paired_1.sort.fastq
|09:48:17|jflucier@ip29:[S_3527]> cat S_3527_paired_2.fastq | paste - - - - | sort -k1,1 -t " " | tr "\t" "\n" > S_3527_paired_2.sort.fastq
|09:50:21|jflucier@ip29:[S_3527]> head S_3527_paired_*.sort.fastq
==> S_3527_paired_1.sort.fastq <==
@ERR5383527.10000000.1
CGCAGCGGCGACTGTCTAAATTCATCAGGTAGTTGCTCCAACTTTCCTCATAGATTATCAGCACATTGGGCTGTTTGTGCTTCATCGTA
+
FFFFFFFFFFFFFFFFFFFFF:FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF,FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF
@ERR5383527.10000001.1
CGCCAGCGCGGGCCAGCTCGTCGCTGAAGGCCGGGTCCTCCGGACCGTCAAGCGGGAGCCCCAACTCCTTCGCCACGGGCTCCAGCGTCGTCCGCGGCAGGGGGAAGTACCCGCCGCTGTGGCTCAGGAGGTGGGCCAGCCTCACGGGGT
+
FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF:FFFFFFFFFFFFFFFFFFFF:FFFFFFFF,FFFFFF,FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF,FFFFFFFF
@ERR5383527.10000009.1
AAGTCTTACTTTCCAACGCTACAAACAACATCAATCAGATTGCAAAGCGAGTAAATTCGACAGGCATAATCTATAAAGACGATATAAACGATATTAAAAAGCAGATTGAACATTTCTCAAAAGAGCTGTGGCAAATTCATTCACTACTTC

==> S_3527_paired_2.sort.fastq <==
@ERR5383527.10000000.2
CGACCTTGTGCCCAACGCCGCCTATATGTTGAAGAAGGCTGTGAAGGAAAAGAAAAACGCTTTTGCCTTTCGTAGCATCGGGAGTTTGCTCTCCGAATATAAGTTCAAGTCGTTGAAGGAAGCCATAAACGTTTACACACA
+
FFFF:FF::FFFFFFFFFF:FFF:FFFF,FFFFFFFF:FFFF,FFFFF,FFFFFFF:FFFFFFFFFFFFFF:FFFFFFFFFFFFFFFFFFFFFFFFF,FFFFFFFFFFFFF,FFF,FFFFFFFFFFFFFFFFFFFFFFF,F
@ERR5383527.10000001.2
ATCTTGAGCGTCGGTACGAGGAGTATCTGGCGGAGGTGATGGAGGTTCACCGCGCCCGGGGCGTCGCCGTGGCCGTGATCGACCGGGTGGGGCGCACGCTTTGGCAGAAGTTGGCCGGATACCGCGACGCGGAGCGCCGCCTGGCGATCG
+
F,FFF:FF:FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF:FFFFFFFFFFFFFFFFF,FFFFFFFFFFFFFFFFFFF:FFFFFFFFFFFFFFFFFFF:FFFFFFFFF
@ERR5383527.10000009.2
TTCAGGAGTTGCTTCTCCCGGTAAAAAGGATTGAATTAGATGCCTAGCAAGAACAGTTCCGTTTGTTCCGGCATCATTTCTTGTTCTTAAAAATTGTTTGTAAGCAGTTTCTTGATGGCATTTATGAGTGCTTACTAAAAGCTGATCGTC
|09:53:09|jflucier@ip29:[S_3527]> repair.sh \
in=S_3527_paired_1.sort.fastq \
in2=S_3527_paired_2.sort.fastq \
out=tmp/S_3527_paired_sorted_1.fastq \
out2=tmp/S_3527_paired_sorted_2.fastq
java -ea -Xmx91777m -cp /cvmfs/soft.computecanada.ca/easybuild/software/2020/Core/bbmap/38.86/current/ jgi.SplitPairsAndSingles rp in=S_3527_paired_1.sort.fastq in2=S_3527_paired_2.sort.fastq out=tmp/S_3527_paired_sorted_1.fastq out2=tmp/S_3527_paired_sorted_2.fastq
Picked up JAVA_TOOL_OPTIONS: -Xmx2g
Executing jgi.SplitPairsAndSingles [rp, in=S_3527_paired_1.sort.fastq, in2=S_3527_paired_2.sort.fastq, out=tmp/S_3527_paired_sorted_1.fastq, out2=tmp/S_3527_paired_sorted_2.fastq]

Set INTERLEAVED to false
Started output stream.

Input:                      8266106 reads       1117210396 bases.
Result:                     8266106 reads (100.00%)     1117210396 bases (100.00%)
Pairs:                      0 reads (0.00%)     0 bases (0.00%)
Singletons:                 8266106 reads (100.00%)     1117210396 bases (100.00%)

Time:                           28.171 seconds.
Reads Processed:       8266k    293.42k reads/sec
Bases Processed:       1117m    39.66m bases/sec

je sais pas trop ce qui se paase... je vais essaye de reinvestiguer ca plus tard. Entre-temps p-e tu reussira a demistifier ce bug avec les fastq

jflucier commented 1 year ago

jai essaye avec autre echantillon et c'est le meme bug:

# cd /nfs3_ib/ip29-ib/ip29/ilafores_group/jo/Plachokova2021/preproc/S_3531
|10:01:30|jflucier@ip29:[S_3531]> repair.sh \
> in=S_3531_paired_1.fastq \
> in2=S_3531_paired_2.fastq \
> out=tmp/S_3531_paired_sorted_1.fastq \
> out2=tmp/S_3531_paired_sorted_2.fastq
java -ea -Xmx91786m -cp /cvmfs/soft.computecanada.ca/easybuild/software/2020/Core/bbmap/38.86/current/ jgi.SplitPairsAndSingles rp in=S_3531_paired_1.fastq in2=S_3531_paired_2.fastq out=tmp/S_3531_paired_sorted_1.fastq out2=tmp/S_3531_paired_sorted_2.fastq
Picked up JAVA_TOOL_OPTIONS: -Xmx2g
Executing jgi.SplitPairsAndSingles [rp, in=S_3531_paired_1.fastq, in2=S_3531_paired_2.fastq, out=tmp/S_3531_paired_sorted_1.fastq, out2=tmp/S_3531_paired_sorted_2.fastq]

Set INTERLEAVED to false
Started output stream.

Input:                      11978074 reads      1677633161 bases.
Result:                     11978074 reads (100.00%)    1677633161 bases (100.00%)
Pairs:                      0 reads (0.00%)     0 bases (0.00%)
Singletons:                 11978074 reads (100.00%)    1677633161 bases (100.00%)

Time:                           48.952 seconds.
Reads Processed:      11978k    244.69k reads/sec
Bases Processed:       1677m    34.27m bases/sec

Faudrait surement trafiquer les headers pour que ca fonctionne et changer @ERR5383531.244.1 pour @ERR5383531.244/1 et @ERR5383531.244.2 pour @ERR5383531.244/2

a tester!

jflucier commented 1 year ago

as-tu reussi a faire fonctionne le repair en fin du compte?

jflucier commented 1 year ago

@jorondo1

peux tu me confimer si le bug persiste depuis les 2000 release que jai effectue

merci

jflucier commented 1 year ago

oubli ca... jai enlever l'utilisation de bbmap. Je diagnostique directement a partir des headers.