schatzlab / crossstitch

Code for phasing SVs with SNPs
52 stars 3 forks source link

phasing plant genome #3

Closed yilunhuangyue closed 6 years ago

yilunhuangyue commented 6 years ago

Hi, Thank you for developing the tools. I am wondering if it can be used for phasing plant genomes?By the way, I have got pacbio reads and illumina reads, can I get phased_snps.vcf only by these two kind of reads? Thanks a lot for any suggestion.

mschatz commented 6 years ago

Most of it works independent of the genome, especially using the SNPs to phase SVs. However, the very last step that assembles the diploid genome only supports human right now, but only because there is some special logic in place to handle the sex chromosomes correctly. If this is a barrier to you, let me know any I can make some recommendations. And yes, you can use just Illumina and PacBio reads, although the phase blocks will be limited by the length of the PacBio reads. In human with a ~0.1% heterozygosity rate this averages to a phase block N50 length of around 250kb to 500kbp, although I would expect it to be more successful in a plant with higher rates of heterozygosity

Good luck!

Mike

yilunhuangyue commented 6 years ago

Hi Mike, I have tried corssstitch for my genome and set refine to "1", but got an error message: ~/software/crossstitch/sv/process.sh: line 6: /crossstitch/testout2/inserts/.txt.: No such file or directory I checked the source code and found that after running "java -cp "${BINDIR}" ReadFinder $WORKINGDIR/"${vcfFile}" $OUTDIR/inserts" in go.sh, no files were created in "inserts" folders . But I did not find any error message, I don't known where is the problem. Could you give me any suggestions?

Best wishes,

Huang

mschatz commented 6 years ago

Hi Mike,

Is the crossstitch code up to date in the github repo? Otherwise, do you have any suggestions as to why this might be failing?

Thanks! Mike

On Fri, May 4, 2018 at 12:46 AM yilunhuangyue notifications@github.com wrote:

Hi Mike, I have tried corssstitch for my genome and set refine to "1", but got an error message: ~/software/crossstitch/sv/process.sh: line 6: /crossstitch/testout2/inserts/.txt.: No such file or directory I checked the source code and found that after running "java -cp "${BINDIR}" ReadFinder $WORKINGDIR/"${vcfFile}" $OUTDIR/inserts" in go.sh, no files were created in "inserts" folders . But I did not find any error message, I don't known where is the problem. Could you give me any suggestions?

Best wishes,

Huang

— You are receiving this because you modified the open/close state. Reply to this email directly, view it on GitHub https://github.com/schatzlab/crossstitch/issues/3#issuecomment-386504881, or mute the thread https://github.com/notifications/unsubscribe-auth/AAL98yhMcHja78veMGEadDjY4kRQ0EVWks5tu905gaJpZM4TfHC8 .

mschatz commented 6 years ago

Hi Huang,

I think the first thing is to confirm that there are insertions in the VCF file. Can you double check that is the case?

Thanks!

Mike

On Fri, May 4, 2018 at 4:07 PM Michael Schatz michael.schatz@gmail.com wrote:

Hi Mike,

Is the crossstitch code up to date in the github repo? Otherwise, do you have any suggestions as to why this might be failing?

Thanks! Mike

On Fri, May 4, 2018 at 12:46 AM yilunhuangyue notifications@github.com wrote:

Hi Mike, I have tried corssstitch for my genome and set refine to "1", but got an error message: ~/software/crossstitch/sv/process.sh: line 6: /crossstitch/testout2/inserts/.txt.: No such file or directory I checked the source code and found that after running "java -cp "${BINDIR}" ReadFinder $WORKINGDIR/"${vcfFile}" $OUTDIR/inserts" in go.sh, no files were created in "inserts" folders . But I did not find any error message, I don't known where is the problem. Could you give me any suggestions?

Best wishes,

Huang

— You are receiving this because you modified the open/close state. Reply to this email directly, view it on GitHub https://github.com/schatzlab/crossstitch/issues/3#issuecomment-386504881, or mute the thread https://github.com/notifications/unsubscribe-auth/AAL98yhMcHja78veMGEadDjY4kRQ0EVWks5tu905gaJpZM4TfHC8 .

yilunhuangyue commented 6 years ago

Hi, I downloaded the latest version and tried again, but met the same error: ~/software/crossstitch-master/sv/process.sh: line 7: /corssstitch/testout3/inserts/.txt.: No such file or directory By the way, my vcf file was created by sniffles and there were "INS" . And I noticed that the log file says"Number of insertions found: 5561 Number of insertions with supporting reads found: 0". Is there any error of my VCF file or mapping bam file?

Thanks

Huang

yilunhuangyue commented 6 years ago

Hi, I tried sniffles with parameter -n -1 and ran crossstitch again, and I got over this error now. Sorry for my mistake. But I met another error when running VCFEditor.java: Exception in thread "main" java.util.NoSuchElementException at java.util.Scanner.throwFor(Scanner.java:862) at java.util.Scanner.next(Scanner.java:1371) at VCFEditor.main(VCFEditor.java:114) Thanks for any suggestion.

mschatz commented 6 years ago

Would it be possible for you to share the data? That would really help us to investigate the issues

Thanks!

Mike

On Sun, May 6, 2018 at 8:46 AM yilunhuangyue notifications@github.com wrote:

Hi, I tried sniffles with parameter -n -1 and ran crossstitch again, and I got over this error now. Sorry for my mistake. But I met another error when running VCFEditor.java: Exception in thread "main" java.util.NoSuchElementException at java.util.Scanner.throwFor(Scanner.java:862) at java.util.Scanner.next(Scanner.java:1371) at VCFEditor.main(VCFEditor.java:114) Thanks for any suggestion.

— You are receiving this because you modified the open/close state. Reply to this email directly, view it on GitHub https://github.com/schatzlab/crossstitch/issues/3#issuecomment-386877031, or mute the thread https://github.com/notifications/unsubscribe-auth/AAL989QssF7odMA4tPRMlAQD1p2aspgbks5tvvC7gaJpZM4TfHC8 .

yilunhuangyue commented 6 years ago

Hi Mike,

My data is unpublished now, so I did not leave the message on github. I am sorry if it troubles you.

You can download my data from http://211.69.140.136/download_tmp/. The original data is too big so I just put a part of my data on the web. If you can't download the data or need more information, please let me known.

In addition, I comment out the code for handling the sex chromosomes when running corssstitch.

Thanks a lot for your kind help.

Huang


From: Michael Schatz notifications@github.com Sent: Monday, May 7, 2018 6:46 PM To: schatzlab/crossstitch Cc: yilunhuangyue; Author Subject: Re: [schatzlab/crossstitch] phasing plant genome (#3)

Would it be possible for you to share the data? That would really help us to investigate the issues

Thanks!

Mike

On Sun, May 6, 2018 at 8:46 AM yilunhuangyue notifications@github.com wrote:

Hi, I tried sniffles with parameter -n -1 and ran crossstitch again, and I got over this error now. Sorry for my mistake. But I met another error when running VCFEditor.java: Exception in thread "main" java.util.NoSuchElementException at java.util.Scanner.throwFor(Scanner.java:862) at java.util.Scanner.next(Scanner.java:1371) at VCFEditor.main(VCFEditor.java:114) Thanks for any suggestion.

— You are receiving this because you modified the open/close state. Reply to this email directly, view it on GitHub https://github.com/schatzlab/crossstitch/issues/3#issuecomment-386877031, or mute the thread https://github.com/notifications/unsubscribe-auth/AAL989QssF7odMA4tPRMlAQD1p2aspgbks5tvvC7gaJpZM4TfHC8 .

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHubhttps://github.com/schatzlab/crossstitch/issues/3#issuecomment-387163833, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AQFzbwke_E4PY7JGdQU3D_leGpEYsRcwks5twJZ8gaJpZM4TfHC8.

mschatz commented 6 years ago

Thank you Huang. Im cc'ing my student Mike that worked on that part of the pipeline. Mike, can you please take a look?

Mike

On Tue, May 8, 2018 at 3:01 AM yilunhuangyue notifications@github.com wrote:

Hi Mike,

My data is unpublished now, so I did not leave the message on github. I am sorry if it troubles you.

You can download my data from http://211.69.140.136/download_tmp/. The original data is too big so I just put a part of my data on the web. If you can't download the data or need more information, please let me known.

In addition, I comment out the code for handling the sex chromosomes when running corssstitch.

Thanks a lot for your kind help.

Huang


From: Michael Schatz notifications@github.com Sent: Monday, May 7, 2018 6:46 PM To: schatzlab/crossstitch Cc: yilunhuangyue; Author Subject: Re: [schatzlab/crossstitch] phasing plant genome (#3)

Would it be possible for you to share the data? That would really help us to investigate the issues

Thanks!

Mike

On Sun, May 6, 2018 at 8:46 AM yilunhuangyue notifications@github.com wrote:

Hi, I tried sniffles with parameter -n -1 and ran crossstitch again, and I got over this error now. Sorry for my mistake. But I met another error when running VCFEditor.java: Exception in thread "main" java.util.NoSuchElementException at java.util.Scanner.throwFor(Scanner.java:862) at java.util.Scanner.next(Scanner.java:1371) at VCFEditor.main(VCFEditor.java:114) Thanks for any suggestion.

— You are receiving this because you modified the open/close state. Reply to this email directly, view it on GitHub < https://github.com/schatzlab/crossstitch/issues/3#issuecomment-386877031>, or mute the thread < https://github.com/notifications/unsubscribe-auth/AAL989QssF7odMA4tPRMlAQD1p2aspgbks5tvvC7gaJpZM4TfHC8

.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub< https://github.com/schatzlab/crossstitch/issues/3#issuecomment-387163833>, or mute the thread< https://github.com/notifications/unsubscribe-auth/AQFzbwke_E4PY7JGdQU3D_leGpEYsRcwks5twJZ8gaJpZM4TfHC8

.

— You are receiving this because you modified the open/close state. Reply to this email directly, view it on GitHub https://github.com/schatzlab/crossstitch/issues/3#issuecomment-387304888, or mute the thread https://github.com/notifications/unsubscribe-auth/AAL98_ZcKFCAUohDZzvf5B4Nxa9ZdG4Kks5twUKygaJpZM4TfHC8 .

fritzsedlazeck commented 6 years ago

Dear Huang,

thanks for sharing. I just want to point out that this is visable on github.

Thanks Fritz

On Tue, May 8, 2018, 2:01 AM yilunhuangyue notifications@github.com wrote:

Hi Mike,

My data is unpublished now, so I did not leave the message on github. I am sorry if it troubles you.

You can download my data from http://211.69.140.136/download_tmp/. The original data is too big so I just put a part of my data on the web. If you can't download the data or need more information, please let me known.

In addition, I comment out the code for handling the sex chromosomes when running corssstitch.

Thanks a lot for your kind help.

Huang


From: Michael Schatz notifications@github.com Sent: Monday, May 7, 2018 6:46 PM To: schatzlab/crossstitch Cc: yilunhuangyue; Author Subject: Re: [schatzlab/crossstitch] phasing plant genome (#3)

Would it be possible for you to share the data? That would really help us to investigate the issues

Thanks!

Mike

On Sun, May 6, 2018 at 8:46 AM yilunhuangyue notifications@github.com wrote:

Hi, I tried sniffles with parameter -n -1 and ran crossstitch again, and I got over this error now. Sorry for my mistake. But I met another error when running VCFEditor.java: Exception in thread "main" java.util.NoSuchElementException at java.util.Scanner.throwFor(Scanner.java:862) at java.util.Scanner.next(Scanner.java:1371) at VCFEditor.main(VCFEditor.java:114) Thanks for any suggestion.

— You are receiving this because you modified the open/close state. Reply to this email directly, view it on GitHub < https://github.com/schatzlab/crossstitch/issues/3#issuecomment-386877031>, or mute the thread < https://github.com/notifications/unsubscribe-auth/AAL989QssF7odMA4tPRMlAQD1p2aspgbks5tvvC7gaJpZM4TfHC8

.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub< https://github.com/schatzlab/crossstitch/issues/3#issuecomment-387163833>, or mute the thread< https://github.com/notifications/unsubscribe-auth/AQFzbwke_E4PY7JGdQU3D_leGpEYsRcwks5twJZ8gaJpZM4TfHC8

.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/schatzlab/crossstitch/issues/3#issuecomment-387304888, or mute the thread https://github.com/notifications/unsubscribe-auth/AB_En5nNkdKzWNJsFwpAhhH6tY43uvKMks5twUKygaJpZM4TfHC8 .

mschatz commented 6 years ago

Hi Huang,

It appears that the alignment file (map.bam) has alignments to chromosomes other than chr1 (such as chr4), but these sequences are not present in the reference file you sent us (ref.fa). Crossstitch requires that all the sequences which are aligned to be present in the reference file provided. Could you please check whether or not this is the case for the version of the files you have?

Thanks,

Mike


From: Michael Schatz michael.schatz@gmail.com Sent: Tuesday, May 8, 2018 1:04:11 PM To: reply@reply.github.com; Michael Kirsche Cc: crossstitch@noreply.github.com; State change Subject: Re: [schatzlab/crossstitch] phasing plant genome (#3)

Thank you Huang. Im cc'ing my student Mike that worked on that part of the pipeline. Mike, can you please take a look?

Mike

On Tue, May 8, 2018 at 3:01 AM yilunhuangyue notifications@github.com<mailto:notifications@github.com> wrote: Hi Mike,

My data is unpublished now, so I did not leave the message on github. I am sorry if it troubles you.

You can download my data from http://211.69.140.136/download_tmp/. The original data is too big so I just put a part of my data on the web. If you can't download the data or need more information, please let me known.

In addition, I comment out the code for handling the sex chromosomes when running corssstitch.

Thanks a lot for your kind help.

Huang


From: Michael Schatz notifications@github.com<mailto:notifications@github.com> Sent: Monday, May 7, 2018 6:46 PM To: schatzlab/crossstitch Cc: yilunhuangyue; Author Subject: Re: [schatzlab/crossstitch] phasing plant genome (#3)

Would it be possible for you to share the data? That would really help us to investigate the issues

Thanks!

Mike

On Sun, May 6, 2018 at 8:46 AM yilunhuangyue notifications@github.com<mailto:notifications@github.com> wrote:

Hi, I tried sniffles with parameter -n -1 and ran crossstitch again, and I got over this error now. Sorry for my mistake. But I met another error when running VCFEditor.java: Exception in thread "main" java.util.NoSuchElementException at java.util.Scanner.throwFor(Scanner.java:862) at java.util.Scanner.next(Scanner.java:1371) at VCFEditor.main(VCFEditor.java:114) Thanks for any suggestion.

— You are receiving this because you modified the open/close state. Reply to this email directly, view it on GitHub https://github.com/schatzlab/crossstitch/issues/3#issuecomment-386877031, or mute the thread https://github.com/notifications/unsubscribe-auth/AAL989QssF7odMA4tPRMlAQD1p2aspgbks5tvvC7gaJpZM4TfHC8 .

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHubhttps://github.com/schatzlab/crossstitch/issues/3#issuecomment-387163833, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AQFzbwke_E4PY7JGdQU3D_leGpEYsRcwks5twJZ8gaJpZM4TfHC8.

— You are receiving this because you modified the open/close state. Reply to this email directly, view it on GitHubhttps://github.com/schatzlab/crossstitch/issues/3#issuecomment-387304888, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AAL98_ZcKFCAUohDZzvf5B4Nxa9ZdG4Kks5twUKygaJpZM4TfHC8.

yilunhuangyue commented 6 years ago

Hi Mike,

The origin alignment file was too large so I just extracted the alignment file of chr1. Maybe I made some mistakes when extracting alignment file, so I upload the new alignment file(map.sort.bam) and SVs (map.sniffles-filt.vcf)today. In addition, I upload the log files for running crossstitch.

And thanks Fritz for your kind reminding, I will delete the files aftering solving the problem.

Best wishes,

Huang

mschatz commented 6 years ago

Thanks! Mike, can you please try again

Mike

On Tue, May 8, 2018 at 11:30 PM yilunhuangyue notifications@github.com wrote:

Hi Mike,

The origin alignment file was too large so I just extracted the alignment file of chr1. Maybe I made some mistakes when extracting alignment file, so I upload the new alignment file(map.sort.bam) and SVs (map.sniffles-filt.vcf)today. In addition, I upload the log files for running crossstitch.

And thanks Fritz for your kind reminding, I will delete the files aftering solving the problem.

Best wishes,

Huang

— You are receiving this because you modified the open/close state. Reply to this email directly, view it on GitHub https://github.com/schatzlab/crossstitch/issues/3#issuecomment-387609451, or mute the thread https://github.com/notifications/unsubscribe-auth/AAL98wWR1v5ZIFNU1G07Izc3xinUvRrxks5twmK6gaJpZM4TfHC8 .

mschatz commented 6 years ago

Thanks for the updated files! I was able to replicate the error (caused by an issue with the way I was handling relative vs absolute paths), and the most recent commit is a fix for it. Please let me know if there continue to be issues.

Thanks,

Mike


From: Michael Schatz michael.schatz@gmail.com Sent: Tuesday, May 8, 2018 11:30:45 PM To: reply@reply.github.com; Michael Kirsche Cc: crossstitch@noreply.github.com; State change Subject: Re: [schatzlab/crossstitch] phasing plant genome (#3)

Thanks! Mike, can you please try again

Mike

On Tue, May 8, 2018 at 11:30 PM yilunhuangyue notifications@github.com<mailto:notifications@github.com> wrote:

Hi Mike,

The origin alignment file was too large so I just extracted the alignment file of chr1. Maybe I made some mistakes when extracting alignment file, so I upload the new alignment file(map.sort.bam) and SVs (map.sniffles-filt.vcf)today. In addition, I upload the log files for running crossstitch.

And thanks Fritz for your kind reminding, I will delete the files aftering solving the problem.

Best wishes,

Huang

— You are receiving this because you modified the open/close state. Reply to this email directly, view it on GitHubhttps://github.com/schatzlab/crossstitch/issues/3#issuecomment-387609451, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AAL98wWR1v5ZIFNU1G07Izc3xinUvRrxks5twmK6gaJpZM4TfHC8.

yilunhuangyue commented 6 years ago

Thanks Mike, this step is processed correctly but there are still two problems.

  1. The last line in go.sh is "rm $WORKINGDIR/hs.log", but I got the error message "rm: cannot remove `/test/hs.log': No such file or directory", so I removed this line and ran program again and got final results.

  2. I noticed the most values of "REF" and "ALT" in refined VCF file were transferred to sequences instead of "N" and <INS> or <DEL> in original vcf files. But when running "$BINDIR/scrubvcf.pl $OUTPREFIX.refined.vcf > $OUTPREFIX.scrubbed.vcf) >& $OUTPREFIX.scrubbed.log", only the lines which "ALT" is still <INS> or <DEL> were processed. In my case, the log file of scrubvcf.pl show information like this: ## Reported 16 of 1447 variants: <DEL> 0 <INS> 10 <INV> 6

And in *.spliced.vcf.svphase file, only these 16 SVs were phased.

Could you please check this?

Thanks,

Huang

mkirsche commented 6 years ago

Hi,

I updated the scrubbing and splicing scripts to account for the format of ref/alt alleles we are currently using, and it seems that all of the SVs are making it into the spliced vcf file now. Could you please let me know it is working for you?

Thanks, Mike

On Wed, May 9, 2018 at 7:15 AM, yilunhuangyue notifications@github.com wrote:

Thanks Mike, this step is processed correctly but there are still two problems.

1.

The last line in go.sh is "rm $WORKINGDIR/hs.log", but I got the error message "rm: cannot remove `/test/hs.log': No such file or directory", so I removed this line and ran program again and got final results. 2.

I noticed the most values of "REF" and "ALT" in refined VCF file were transferred to sequences instead of "N" and "|" in original vcf files. But when running scrubvcf.pl, only the lines which "ALT" is still or " were processed. In my case, the log file of scrubvcf.pl show imformation like this:

Reported 16 of 1447 variants:

0 10 6

And in *.spliced.vcf.svphase file, only these 16 SVs were phased.

Could you please check this?

Thanks,

Huang

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/schatzlab/crossstitch/issues/3#issuecomment-387706034, or mute the thread https://github.com/notifications/unsubscribe-auth/AJGKytM8kL8teng83nbuhzG7WEEBZW9iks5tws_lgaJpZM4TfHC8 .

yilunhuangyue commented 6 years ago

Thanks Mike, it is working now. By the way, how to cite corssstitch if I use it in my research.

Best Wishes, Huang

mschatz commented 6 years ago

Great! Please cite the github repo for now -- we are working on a paper this summer to describe it in more detail

Best

Mike

On Thu, May 10, 2018 at 6:54 AM yilunhuangyue notifications@github.com wrote:

Thanks Mike, it is working now. By the way, how to cite corssstitch if I use it in my research.

Best Wishes, Huang

— You are receiving this because you modified the open/close state. Reply to this email directly, view it on GitHub https://github.com/schatzlab/crossstitch/issues/3#issuecomment-388021320, or mute the thread https://github.com/notifications/unsubscribe-auth/AAL980TCa0yphKNcyaN1LYDktpKe5XaYks5txBxdgaJpZM4TfHC8 .