kangxiongbin / StrainXpress

StrainXpress is a de novo assembly method which base on overlap-layout-consensus (OLC) paradigm and can fast and accurately assemble high complexity metagenome sequencing data at strain resolution.
GNU General Public License v3.0
13 stars 2 forks source link

Error'cmd_polyte.sh: No such file or directory ' #3

Closed cifuj closed 2 years ago

cifuj commented 2 years ago

Hi,

I have an error while running StrainXpress. I just got the last version (Update pipeline_per_stage.v3.py) and I'm using the following command: $ python ~/StrainXpress/scripts/strainxpress.py -fq 5383_B.fq -fast -t 48 -average_read_len 150 And I got the following error:

pid 67085's current affinity mask: ffffffffffffffffffff pid 67085's new affinity mask: ff begin... ################################################## the 1/1 part start... this is the: 0 for 100w lines this is the: 1000000 for 100w lines this is the: 2000000 for 100w lines this is the: 3000000 for 100w lines this is the: 4000000 for 100w lines this is the: 5000000 for 100w lines this is the: 6000000 for 100w lines this is the: 7000000 for 100w lines this is the: 8000000 for 100w lines the 1/1 part finished...

################################################## cat: cmd_polyte.sh: No such file or directory successfully execute: split 5383_B.fq -l 1057832 -d -a 2 sub successfully execute: cat cmd_overlap.sh | xargs -i -P 48 bash -c "{}"; successfully execute: cat sub.map > all_reads_sort.map successfully execute: rm sub; successfully execute: python /home/StrainXpress/scripts/get_readnames.py 5383_B.fq readnames.txt successfully execute: python /home/StrainXpress/scripts/bin_pointer_limited_filechunks_shortpath.py all_reads_sort.map readnames.txt 15000 strainxpress 48 successfully execute: python /home/StrainXpress/scripts/getclusters.py strainxpress_max15000_final 48 successfully execute: python /home/StrainXpress/scripts/get_fq_cluster.py strainxpress_max15000_final_clusters_grouped.json 5383_B.fq /scratch/tmp.1268209/reads/fq_15000 successfully execute: rm -rf Chunkfile; rm strainxpress_max15000_final_clustersizes.json strainxpress_max15000_final_clusters_unchained.json strainxpress_max15000_final_clusters.json successfully execute: cat cmd_polyte.sh | xargs -i -P 48 bash -c "{}"; Traceback (most recent call last): File "/home/StrainXpress/scripts/strainxpress.py", line 171, in sys.exit(main()) File "/home/StrainXpress/scripts/strainxpress.py", line 109, in main execute(cmd_merge_contigs) File "/home/StrainXpress/scripts/strainxpress.py", line 161, in execute with open("output.txt","r") as file: FileNotFoundError: [Errno 2] No such file or directory: 'output.txt'

jsgounot commented 2 years ago

Yes I just hit this error as well when trying to reproduce my previous results.

kangxiongbin commented 2 years ago

Can you check if these folder ./fq_15000/*/ have contigs.fasta? Or can you rerun it (I think the insert size of your reads is 300): python ~/StrainXpress/scripts/strainxpress.py -fq 5383_B.fq -fast -t 48 -average_read_len 150 -insert_size 300

cifuj commented 2 years ago

Same error and the folder fq_15000/ is empty.

kangxiongbin commented 2 years ago

Same error and the folder fq_15000/ is empty.

Do it have any folder under fq_15000? Do it have any fq fild? Could you present some content of your fq file (for example read names )? I think maybe these reads don't be clustered.

cifuj commented 2 years ago

There is a fq_15000 folder, but nothing inside. These are the files I have obtained so far. all.contigs_15000.fasta (empty) all_reads_sort.map cmd_overlap.sh out.txt readnames.txt (empty)

These are the 50 first read names. reads_names_20220407.txt

kangxiongbin commented 2 years ago

There is a fq_15000 folder, but nothing inside. These are the files I have obtained so far. all.contigs_15000.fasta (empty) all_reads_sort.map cmd_overlap.sh out.txt readnames.txt (empty)

These are the 50 first read names. reads_names_20220407.txt

Yes. Because reads don't be clustered and strainxpress cannot assembly reads (fq_15000 is empty). Now, I know the problem. In the cluster step, strainxpress cannot identify the read names. Could you change your reads name like below: previously: @VH00578:5:AAATVMCM5:1:1102:73125:16884 1:N:0:ACGGACTT+TTGATCCG @VH00578:5:AAATVMCM5:1:1102:73125:16884 2:N:0:ACGGACTT+TTGATCCG

rename: @VH00578_73125_16884/1 @VH00578_73125_16884/2

thx

clb21565 commented 2 years ago

advice on how to reformat?

kangxiongbin commented 2 years ago

advice on how to reformat?

Do you instal Perl? Take an example of the above format, you can directly transfer it: perl -ane'if(/^\@/){@a = split/\:/; $b = (split/\s/,$a[-4])[-1]; print"$a[0]$a[-5]$b\n";}else{print}' fq_file > new_fq_file

Maybe here are also other approaches.

clb21565 commented 2 years ago

perl -ane'if(/^@/){@A = split/:/; $b = (split/\s/,$a[-4])[-1]; print"$a[0]$a[-5]$b\n";}else{print}' fq_file > new_fq_file

didn't work, this completely erased all of the headers

how have you been generating the fastq files for input to strainxpress?

kangxiongbin commented 2 years ago

perl -ane'if(/^@/){@A = split/:/; $b = (split/\s/,$a[-4])[-1]; print"$a[0]$a[-5]$b\n";}else{print}' fq_file > new_fq_file

didn't work, this completely erased all of the headers

how have you been generating the fastq files for input to strainxpress?

You need to modify the code base on read name of your fq file. Can you let me have a look read name in your fq file?

clb21565 commented 2 years ago

You need to modify the code base on read name of your fq file. Can you let me have a look read name in your fq file? thank you! here: @A00201R:332:H2LYWDRXY:1:1101:13819:35446 1:N:0:TCTATCCTAA+GAGAGGTTCG

out of curiosity, how do you typically generate the intereleaved fastq that will end up going to strainxpress? is it from samtools fastq, or another pipeline?

kangxiongbin commented 2 years ago

You need to modify the code base on read name of your fq file. Can you let me have a look read name in your fq file? thank you! here: @A00201R:332:H2LYWDRXY:1:1101:13819:35446 1:N:0:TCTATCCTAA+GAGAGGTTCG

Hope it work. Previously, I made some mistakes in codes. perl -ane'if(/^@/){@a = split/\:/; $b = (split/\s/,$a[-4])[-1]; print"$a[0]$a[-5]/$b\n";}else{print}' fq_file > new_fq_file

I found github change my type, the a after @ must be lower case @a is incorrect

out of curiosity, how do you typically generate the intereleaved fastq that will end up going to strainxpress? is it from samtools fastq, or another pipeline?

My fq file that download from sequencing machine is like below. I don't convert it with any other tools.

@S0R0/1 @S0R0/2

clb21565 commented 2 years ago

[image: image.png]

didn't seem to work again :/

On Fri, Jul 15, 2022 at 5:48 PM kangxiongbin @.***> wrote:

You need to modify the code base on read name of your fq file. Can you let me have a look read name in your fq file? thank you! here: @A00201R:332:H2LYWDRXY:1:1101:13819:35446 1:N:0:TCTATCCTAA+GAGAGGTTCG

Hope it work. Previously, I made some mistakes in codes. perl -ane'if(/^@@.*** https://github.com/A = split/:/; $b = (split/\s/,$a[-4])[-1]; print"$a[0]$a[-5]/$b\n";}else{print}' fq_file > new_fq_file

out of curiosity, how do you typically generate the intereleaved fastq that will end up going to strainxpress? is it from samtools fastq, or another pipeline?

My fq file that download from sequencing machine is like below. I don't convert it with any other tools.

@S0R0/1 @S0R0/2

— Reply to this email directly, view it on GitHub https://github.com/kangxiongbin/StrainXpress/issues/3#issuecomment-1185971258, or unsubscribe https://github.com/notifications/unsubscribe-auth/AIM35ZDYY6IMZB63V2Y65ADVUHMBNANCNFSM5SYV53FA . You are receiving this because you commented.Message ID: @.***>

-- Connor Brown Graduate Research Assistant with Helm Lab https://www.biochem.vt.edu/people/faculty/helm-richard.html and Pruden Lab https://www.pruden.cee.vt.edu/ Department of Genetics, Bioinformatics, and Computational Biology Virginia Tech

clb21565 commented 2 years ago

that did it!! thank you!!

kangxiongbin commented 2 years ago

that did it!! thank you!!

Good!