wtsi-hpag / Scaff10X

Pipeline for scaffolding and breaking a genome assembly using 10x genomics linked-reads
MIT License
20 stars 3 forks source link

segfault #12

Open wheaton5 opened 5 years ago

wheaton5 commented 5 years ago

If i run without parameters, i get the usage as expected.

then if i run with parameters scaff10x -nodes 30 funestus_redbean.purged.fa bamtofastq_S1_L000_R1_001.fastq.gz bamtofastq_S1_L000_R2_001.fastq.gz hybrid_redbean2.purged.scaff.fa

I get ./scaffme2.sh: line 1: 31890 Segmentation fault

I'm on farm4 in directory /lustre/scratch118/malaria/team222/hh5/projects/analysis/assembly/arabiensis and created the fastq files from a longranger run vs a different assembly with 10x's bamtofastq which should put the reads in the same format as they were as input to longranger.

zning-sanger commented 5 years ago

Hi Haynes,

I had a quick look at your data and run the dataset by myself. It seems that your problem was due to the way you run the pipeline. Normally 10X reads have a number of files in a directory and we suggest that the users make a file such as file.dat

q1=/lustre/scratch117/sciops/team117/hpag/zn1/project/mammals/mBosTau1/reads_10x/mBosTau1_S1_L001_R1_001.fastq.gz q2=/lustre/scratch117/sciops/team117/hpag/zn1/project/mammals/mBosTau1/reads_10x/mBosTau1_S1_L001_R2_001.fastq.gz q1=/lustre/scratch117/sciops/team117/hpag/zn1/project/mammals/mBosTau1/reads_10x/mBosTau1_S2_L001_R1_001.fastq.gz q2=/lustre/scratch117/sciops/team117/hpag/zn1/project/mammals/mBosTau1/reads_10x/mBosTau1_S2_L001_R2_001.fastq.gz q1=/lustre/scratch117/sciops/team117/hpag/zn1/project/mammals/mBosTau1/reads_10x/mBosTau1_S3_L001_R1_001.fastq.gz q2=/lustre/scratch117/sciops/team117/hpag/zn1/project/mammals/mBosTau1/reads_10x/mBosTau1_S3_L001_R2_001.fastq.gz q1=/lustre/scratch117/sciops/team117/hpag/zn1/project/mammals/mBosTau1/reads_10x/mBosTau1_S4_L001_R1_001.fastq.gz q2=/lustre/scratch117/sciops/team117/hpag/zn1/project/mammals/mBosTau1/reads_10x/mBosTau1_S4_L001_R2_001.fastq.gz

You then run the code in this way

scaff10x -nodes 30 {options} -data file.dat funestus_redbean.purged.fa funestus_redbean.scaff10x.fa

You will get a few files funestus_redbean.scaff10x.fa funestus_redbean.scaff10x.fa.agp funestus_redbean.scaff10x.fa.cov

If you use -plot funestus_redbean-bclength.png, you get the barcode length distribution as well.

Another way to run the code is like way you did:

scaff10x -nodes 30 funestus_redbean.purged.fa bamtofastq_S1_L000_R2_001.fastq.g bamtofastq_S1_L000_R2_001.fastq.gz hybrid_redbean2.purged.scaff.fa

However, the file bamtofastq_S1_L000_R1_001.fastq.gz and bamtofastq_S1_L000_R2_001.fastq.gz have to be processed using scaff_reads: barcode tags have been put at the end of the read name. Looks that you didn't do that.

As said before, please use -data file.dat.

Finally, the results of scaf10x are not good, see /lustre/scratch117/sciops/team117/hpag/zn1/project/worm

A lot of contigs have no read coverage at all. I don't know this is an sample issue or you didn't extract all the reads.

Zemin

wheaton5 commented 5 years ago

I wanted to run it this way (.dat file) but got a segfault with that and as the command line usage (when not given any arguments) shows primarily the other way of running, I tried that next. So going back to the .dat way, i still get a segfault.

Segmentation fault scaff10x -nodes 30 -plot barcode_length.png -data scaffinput.dat An_arab_wm_A_ref.fa in directory /lustre/scratch118/malaria/team222/hh5/projects/analysis/assembly/arabiensis/falcon/haplomerger

zning-sanger commented 5 years ago

Your input file q1=10x_reads/longranger222_wgs_29692_BAdASS7880765_arabiensis_freebayes_LibraryNotSpecified_1_unknown_fc/bamtofastq_S1_L000_bamtofastq_S1_L000_R1_001.fastq.gz q2=10x_reads/longranger222_wgs_29692_BAdASS7880765_arabiensis_freebayes_LibraryNotSpecified_1_unknown_fc/bamtofastq_S1_L000_bamtofastq_S1_L000_R2_001.fastq.gz q1=10x_reads/longranger222_wgs_29692_BAdASS7880765_arabiensis_freebayes_LibraryNotSpecified_1_unknown_fc/bamtofastq_S1_L000_bamtofastq_S1_L000_R1_002.fastq.gz q2=10x_reads/longranger222_wgs_29692_BAdASS7880765_arabiensis_freebayes_LibraryNotSpecified_1_unknown_fc/bamtofastq_S1_L000_bamtofastq_S1_L000_R2_002.fastq.gz q1=10x_reads/longranger222_wgs_29692_BAdASS7880765_arabiensis_freebayes_LibraryNotSpecified_1_unknown_fc/bamtofastq_S1_L000_bamtofastq_S1_L000_R1_003.fastq.gz q2=10x_reads/longranger222_wgs_29692_BAdASS7880765_arabiensis_freebayes_LibraryNotSpecified_1_unknown_fc/bamtofastq_S1_L000_bamtofastq_S1_L000_R2_003.fastq.gz q1=10x_reads/longranger222_wgs_29692_BAdASS7880765_arabiensis_freebayes_LibraryNotSpecified_1_unknown_fc/bamtofastq_S1_L000_bamtofastq_S1_L000_R1_004.fastq.gz q2=10x_reads/longranger222_wgs_29692_BAdASS7880765_arabiensis_freebayes_LibraryNotSpecified_1_unknown_fc/bamtofastq_S1_L000_bamtofastq_S1_L000_R2_004.fastq.gz

ls -l 10x_reads/longranger222_wgs_29692_BAdASS7880765_arabiensis_freebayes_LibraryNotSpecified_1_unknown_fc/bamtofastq_S1_L000_bamtofastq_S1_L000_R1_001.fastq.gz ls: cannot access

You need to give the full path. q1=/lustre/scratch117/sciops/team117/hpag/zn1/project/mammals/mBosTau1/reads_10x/mBosTau1_S1_L001_R1_001.fastq.gz q2=/lustre/scratch117/sciops/team117/hpag/zn1/project/mammals/mBosTau1/reads_10x/mBosTau1_S1_L001_R2_001.fastq.gz

Zemin

zning-sanger commented 5 years ago

I have a debug version for which you can use an existing align.dat file:

/nfs/users/nfs_z/zn1/src/Scaff10X/src/scaff10x -nodes 2 {options} -plot bclength.png -noalign /lustre/scratch117/sciops/team117/hpag/zn1/project/worm/tmp_rununik_57281/align.dat funestus_redbean.purged.fa funestus_redbean.scaff10x-t2.fa > try.out

Note: give the full path of align.dat

Zemin

wheaton5 commented 5 years ago

Ok I'm using full paths for all inputs now

./scaffme.sh File not in the working directory! File -data not found and please copy it to your working directory!

But it is in the current directory pwd /lustre/scratch118/malaria/team222/hh5/projects/analysis/assembly/arabiensis/falcon/haplomerger (base) haplomerger: ls scaffinput.dat scaffinput.dat

in scaffme.sh is cat scaffme.sh /nfs/users/nfs_s/sm15/dev/Scaff10X-4.1/src/scaff10x -nodes 30 -plot barcode_lengtg.png -data /lustre/scratch118/malaria/team222/hh5/projects/analysis/assembly/arabiensis/falcon/haplomerger/scaffinput.dat /lustre/scratch118/malaria/team222/hh5/projects/analysis/assembly/arabiensis/falcon/haplomerger/An_arab_wm_A_ref.fa

zning-sanger commented 5 years ago

Change the scaffme.sh file into this

/nfs/users/nfs_s/sm15/dev/Scaff10X-4.1/src/scaff10x -nodes 30 -plot barcode_lengtg.png -data scaffinput.dat An_arab_wm_A_ref.fa An_arab_wm_A_ref-scaff10x.fa > try.out

wheaton5 commented 5 years ago

./scaffme2.sh (base) haplomerger: cat try.out File not in the working directory! File -data not found and please copy it to your working directory! (base) haplomerger: cat scaffme2.sh /nfs/users/nfs_s/sm15/dev/Scaff10X-4.1/src/scaff10x -nodes 30 -plot barcode_lengtg.png -data scaffinput.dat An_arab_wm_A_ref.fa An_arab_wm_A_ref-scaff10x.fa > try.out

zning-sanger commented 5 years ago

Don't know if Shane has got the recent function for "-plot". Please try to download and install the package for yourself. Or try

/nfs/users/nfs_z/zn1/src/Scaff10X/src/scaff10x

Sorry about all these problems!

zning-sanger commented 5 years ago

Just tried /nfs/users/nfs_z/zn1/src/Scaff10X/src/scaff10x -nodes 52 -plot bclength.png -data file.dat funestus_redbean.purged.fa funestus_redbean.scaff10x-t2.fa >

and Shane's code is working!

wheaton5 commented 5 years ago

Ok that seems to be working. Thanks for the help!