wtsi-hpag / Scaff10X

Pipeline for scaffolding and breaking a genome assembly using 10x genomics linked-reads
MIT License
20 stars 3 forks source link

Segmentation fault #9

Open francicco opened 5 years ago

francicco commented 5 years ago

Hi,

I just downloaded and installed the program:

Downloading and installing BWA
 BWA succesfully installed!

Downloading and installing Smalt
 Smalt succesfully installed!
Downloading and installing pigz
 pigz succesfully installed!

Compiling scaff10X sources

Checking installation:
 Congrats: installation successful!

But when I execute it I get a 317644 Segmentation fault

What's wrong? Thanks F

EdHarry commented 5 years ago

Hi Francicco,

Are you running scaff10x on the same machine that you compiled it on?

Which version of gcc are you using and is it configured for the machine your running scaff10x on?

francicco commented 5 years ago

Hi @EdHarry,

Yes, it is the same machine. gcc/5.2.0 F

EdHarry commented 5 years ago

Ok, can you confirm it's the main scaff10x binary (src/scaff10x) that is seg faulting?

francicco commented 5 years ago

Yes!

EdHarry commented 5 years ago

ok, can you try some of the other binaries, say src/scaff-bin/scaff_matrix and src/scaff-bin/scaff_FilePreProcess, do they also seg fault?

francicco commented 5 years ago

They all return the help message, scaff10x included. F

EdHarry commented 5 years ago

I though you said the main binary seg faulted? is it seg faulting or printing a help message?

francicco commented 5 years ago

It is seg faulting when I execute it with the input files F

On Tue, 4 Jun 2019, 17:01 Ed Harry, notifications@github.com wrote:

I though you said the main binary seg faulted? is it seg faulting or printing a help message?

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/wtsi-hpag/Scaff10X/issues/9?email_source=notifications&email_token=ACEW6FVJH4E5SCTJVNL4M4LPY2GWNA5CNFSM4HS5GYTKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODW5BSOY#issuecomment-498735419, or mute the thread https://github.com/notifications/unsubscribe-auth/ACEW6FVEWXZ6CSI7CDYAUD3PY2GWNANCNFSM4HS5GYTA .

EdHarry commented 5 years ago

Ok, could you tell me the full command you're executing, and could you provide the log of all the output please.

EdHarry commented 5 years ago

also, could you try the last stable release https://github.com/wtsi-hpag/Scaff10X/archive/v4.1.tar.gz and see if that works for you

francicco commented 5 years ago

This is the command line:

scaff10x -nodes 32 Diul10x.1.80.curated.2r0.05e30000l5a0.9.BWA.ARCS.fasta /home/fc464/rds/rds-shm37-helixmbodyw/HeliconiiniProg/10xChromium/fastq/Diul/Diul.LinkedReads.R1.fastq.gz /home/fc464/rds/rds-shm37-helixmbodyw/HeliconiiniProg/10xChromium/fastq/Diul/Diul.LinkedReads.R2.fastq.gz Diul10x.1.80.curated.2r0.05e30000l5a0.9.BWA.ARCS.scaff10x.fasta

Now I test the other version

F

francicco commented 5 years ago

That version works. Specifically the binary in src not in src/scaff-bin

francicco commented 5 years ago

One last question. How do I have to format the fastq files. Is the barcode needs to be included the read sequence of header. If in the header, how?

Thanks a lot F

zning-sanger commented 5 years ago

Hi Francicco,

Could you send me the information of all the files in the working directory? Either here or via email ( zn1@sanger.ac.uk ). It would tell me roughly it failed at which stage. You just do

ls -lrt tmp*

Zemin

EdHarry commented 5 years ago

One last question. How do I have to format the fastq files. Is the barcode needs to be included the read sequence of header. If in the header, how?

Thanks a lot F

You don't need to format the fastq files, the barcodes will be extracted from the reads automatically

francicco commented 5 years ago

So, this is how my fastq are formatted:

@D00352:461:CCYW4ANXX:2:1101:16223:19699_AAAAAAAAACCAGAAA
TAACTAAGAATTCGAAAGAAGATTCGAACTCGCGCCTCCTGAATACCGTCCGGGCGCTCTCACCACTAAGCCATGCGTTCTACTACAAGCTGCGTCGAAATT
+
BFB<<FF<F/F///<FF/FF/<<<//</</<B<<B/BF<F<F/<<FF////F///7<BF</<B7BF//<BFF<B//7B<B/7B/B/BB/B/BF/////B7F<
@D00352:461:CCYW4ANXX:3:1205:1151:93622_AAAAAAAAAGTACCAA
GAAAAACAAAAAAAAAACGAACTACTTTTATCAGATAACATTGTTCTTTGAGCACATTTACAAAATAGCGATTTCATTTCAAAAAAACTAATAATTCATTGT
+
BFBF/FFFFBFFFFFFF<//<B/FF/<F/F<////BFFFFB//</<FFFF<<BFBFFFBFFFFFFFFFB//B/7/FB/<//7BFBBBB7FB<FFFFB7B/77
@D00352:461:CCYW4ANXX:4:2307:1605:2891_AAAAAAAACAATTCCA
ATAGAGGATGAAAGTGGCAGTTCACGTGGCGAAGCCGCGAGCGGGTGGCTAGTAAACAATAAGCGAGTTATCTCACCAGGTAAAATGTTGAAACATGATTCA
+
FFFFFFFFFFFFFFFBFBF<<F/FFB/F/FFFFFFFBFFFFFFF<<BBFFFFFFFFFFFFFFFFFBFBFFFFFFBFFFFFFFFFFFBFFBFFFFFBB/FFFF

It's the format I used for ARCS.

I tested break10x with those reads using this command line:

break10x -nodes $THREADS $ASSEMBLY.fasta $R1 $R2 \
                $ASSEMBLY.break10x.fasta \
                $ASSEMBLY.BreakPoints

It produced the output files. The BreakPoints file contains 38 lines. Now I'm testing scaff10x.

From what I understood break10x finds errors and scaff10x scaffold the contigs. On top of that I think you suggest to also use ARCS. Is That right?

Do you still want the files? The newest version doesn't generate any file, it crashes immediately.

F

zning-sanger commented 5 years ago

Yes, break10x finds errors and scaff10x scaffold the contigs. To use or not use ARCS is your choice. But it is good to compare.

Zemin

zning-sanger commented 5 years ago

I just tested v4.2 in my computers and it works at various options.

francicco commented 5 years ago

I'll try to recompile it Thanks a lot

F

zning-sanger commented 5 years ago

Reads in Diul.LinkedReads.R1.fastq.gz

are in this format:

@D00352:461:CCYW4ANXX:2:1101:16223:19699_AAAAAAAAACCAGAAA TAACTAAGAATTCGAAAGAAGATTCGAACTCGCGCCTCCTGAATACCGTCCGGGCGCTCTCACCACTAAGCCATGCGTTCTACTACAAGCTGCGTCGAAATT + BFB<<FF<F/F///<FF/FF/<<<//</</<B<<B/BF<F<F/<<FF////F///7<BF</<B7BF//<BFF<B//7B<B/7B/B/BB/B/BF/////B7F< @D00352:461:CCYW4ANXX:3:1205:1151:93622_AAAAAAAAAGTACCAA GAAAAACAAAAAAAAAACGAACTACTTTTATCAGATAACATTGTTCTTTGAGCACATTTACAAAATAGCGATTTCATTTCAAAAAAACTAATAATTCATTGT + BFBF/FFFFBFFFFFFF<//<B/FF/<F/F<////BFFFFB//</<FFFF<<BFBFFFBFFFFFFFFFB//B/7/FB/<//7BFBBBB7FB<FFFFB7B/77 @D00352:461:CCYW4ANXX:4:2307:1605:2891_AAAAAAAACAATTCCA ATAGAGGATGAAAGTGGCAGTTCACGTGGCGAAGCCGCGAGCGGGTGGCTAGTAAACAATAAGCGAGTTATCTCACCAGGTAAAATGTTGAAACATGATTCA + FFFFFFFFFFFFFFFBFBF<<F/FFB/F/FFFFFFFBFFFFFFF<<BBFFFFFFFFFFFFFFFFFBFBFFFFFFBFFFFFFFFFFFBFFBFFFFFBB/FFFF

The format used for ARCS also should work for scaff10x. Make sure that your target assembly is in the working directory and two read files are with correct links.

francicco commented 5 years ago

you mean ln TARGET LINK_NAME in the same directory where the assembly is? F

EdHarry commented 5 years ago

Hi Francicco,

If you're still having problems, could you try making small test files from your target assembly and 10x data with a few reads in them (keep them below 1M in size) and see if running with them still causes a seg fault? If it does could you upload the files here please.

francicco commented 5 years ago

It looks like I don't have the segmentation fault anymore. Maybe because I generate the soft links.

The 4.1 worked pretty well my assembly looks better adding breck10x and scaff10x to my pipeline than before! Thanks F

zning-sanger commented 5 years ago

Good to hear that.

Zemin