wheaton5 / souporcell

Clustering scRNAseq by genotypes
MIT License
165 stars 46 forks source link

vartrix problem #126

Open sanchezy opened 2 years ago

sanchezy commented 2 years ago

Hi @wheaton5

I ran the souporcell_latest.sif pipeline (using singularity) successfully for 14 of my 16 libraries. In two of them I got an error and tracked back to vartrix (in the vartrix.err). The error is this:

Traceback (most recent call last): File "/opt/souporcell/souporcell_pipeline.py", line 589, in <module> vartrix(args, final_vcf, bam) File "/opt/souporcell/souporcell_pipeline.py", line 512, in vartrix subprocess.check_call(cmd, stdout = out, stderr = err) File "/usr/local/envs/py36/lib/python3.6/subprocess.py", line 311, in check_call raise CalledProcessError(retcode, cmd) subprocess.CalledProcessError: Command '['vartrix', '--mapq', '30', '-b', '/home/yaraScratch/souporcell-F1678CM-AB2-Sc-4-1/souporcell_minimap_tagged_sorted.bam', '-c', '/home/yara/Scratch/souporcell-F1678CM-AB2-Sc-4-1/barcodes.tsv', '--scoring-method', 'coverage', '--threads', '8', '--ref-matrix', '/home/yara/Scratch/souporcell-F1678CM-AB2-Sc-4-1/ref.mtx', '--out-matrix', '/home/yara/Scratch/souporcell-F1678CM-AB2-Sc-4-1/alt.mtx', '-v', '/home/yara/Scratch/souporcell-F1678CM-AB2-Sc-4-1/souporcell_merged_sorted_vcf.vcf.gz', '--fasta', '/home/yara/Scratch/references/refdata-cellranger-GRCh38-3.0.0/fasta/genome.fa', '--umi']' returned non-zero exit status 101.

I emailed the crash reports to the authors and they replied that I should try with a newest version of vartrix (https://github.com/10XGenomics/vartrix/releases/tag/v1.1.22). So, my questions are: is there a way around this? how could I do this? would it be possible for you to add this to the souporcell_latest.sif?

Many thanks for your help!

wheaton5 commented 2 years ago

I'll do this as soon as I can. I need a computer with admin access (work computer doesn't have that) and I'm trying to find the charger to my 2011 macbook air lol. I just moved and it wasnt in the same box as the computer... You could run vartrix manually, add the files to that folder as well as a vartrix.done file and then restart the pipeline. It will see the vartrix.done file and go from the next step. Just use the same arguments as in the error message above.

changostraw commented 2 years ago

I also keep getting a vatrix crash. However, I cannot find the crash report.

^[[0m^[[0m^[[31mWell, this is embarrassing.

vartrix had a problem and crashed. To help us diagnose the problem you can send us a crash report.

"We have generated a report file at "/tmp/report-cb2042fa-804e-491a-bc56-91f750318372.toml". Submit an issue or email with the subject of "vartrix Crash Report" and include the report as an attachment.

We take privacy seriously, and do not perform any automated error collection. In order to improve the software, we rely on people to submit reports.

Thank you kindly! ^[[0m"

There is no tmp/ directory in the working directory so I am not sure where the report was saved. thanks!

wheaton5 commented 2 years ago

Is there a vartrix.err file?

changostraw commented 2 years ago

Yes that is all vartrix.err files contains.

"^[[0m^[[0m^[[31mWell, this is embarrassing.

vartrix had a problem and crashed. To help us diagnose the problem you can send us a crash report.

"We have generated a report file at "/tmp/report-cb2042fa-804e-491a-bc56-91f750318372.toml". Submit an issue or email with the subject of "vartrix Crash Report" and include the report as an attachment.

Authors: Ian Fiddes ian.fiddes@10xgenomics.com, Patrick Marks patrick@10xgenomics.com We take privacy seriously, and do not perform any automated error collection. In order to improve the software, we rely on people to submit reports.

Thank you kindly! ^[[0m"

I cannot locate the report anywhere - at least in directories I have permissions for

wheaton5 commented 2 years ago

It might be something upstream to vartrix and we are giving vartrix bad input. What does the vcf look like? Can you try running vartrix manually?

changostraw commented 2 years ago

I think it was my vcf file. It had been aligned to h19 by the sequencing centre and not h38 like my bam file. It is running fine now that I lifted it to h38. Although now I am having a problem with the clustering, but I will open another issue for that. thanks!

LorenzoMerotto commented 2 years ago

I have the same problem here. I have 4 libraries, analyzed with the same CellRanger version and the same reference genome. The analysis is completed for 3 out of 4 samples, while one of them crashed with the same error message

wheaton5 commented 2 years ago

Can u provide the contents of any of the .err files?

LorenzoMerotto commented 2 years ago

We have generated a report file at "/tmp/1555006.1.bigmem.q/report-be2ad8f9-943e-415c-87b3-02b37277f039.toml". Submit an issue or email with the subject of "vartrix Crash Report" and include the report as an attachment.

We take privacy seriously, and do not perform any automated error collection. In order to improve the software, we rely on people to submit reports.

Thank you kindly!


However the temporary directory does not seem to exist

- This is the content of the `retag.err` file

[bam_sort_core] merging from 1 files and 1 in-memory blocks... [bam_sort_core] merging from 1 files and 1 in-memory blocks... [bam_sort_core] merging from 2 files and 1 in-memory blocks... [bam_sort_core] merging from 2 files and 1 in-memory blocks... [bam_sort_core] merging from 2 files and 1 in-memory blocks... [bam_sort_core] merging from 2 files and 1 in-memory blocks... [bam_sort_core] merging from 2 files and 1 in-memory blocks... [bam_sort_core] merging from 3 files and 1 in-memory blocks... [bam_sort_core] merging from 3 files and 1 in-memory blocks...


- This is the content of the `bcftools.err`

Writing to /tmp/bcftools-sort.d0HlEo Merging 1 temporary files Cleaning Done

drneavin commented 1 year ago

@LorenzoMerotto and @wheaton5, did you work this out? We are running in to a similar issue where most pools have executed correctly but a couple haven't. They have been processed the same way upstream to this so the reason for failure is not clear. Any input you have would be fantastic!

Thanks for your help!

wheaton5 commented 1 year ago

I think I need more information. Usually when vartrix fails, its due to a previous error. Probably freebayes failing. Can you check all .err files and also whether the vcf output from freebayes is empty?

drneavin commented 1 year ago

Thanks for the fast response @wheaton5 !

I can't see anything in particular that jumps out as a problem with any of the preceding steps but I've put details below so hopefully you see something that we've missed.

Here's a summary of the files generated in the failed pool:

-rw-r--r-- 1        27794 Oct 27 13:47 fastqs.done
-rw-r--r-- 1         4698 Oct 27 17:25 minimap.err
-rw-r--r-- 1         2182 Oct 27 17:25 remapping.done
-rw-r--r-- 1         1023 Oct 27 17:51 retag.err
-rw-r--r-- 1      43123150821 Oct 27 20:52 souporcell_minimap_tagged_sorted.bam
-rw-r--r-- 1      6012584 Oct 27 21:08 souporcell_minimap_tagged_sorted.bam.bai
-rw-r--r-- 1           0 Oct 27 21:09 retagging.done
-rw-r--r-- 1     62832057 Oct 27 21:30 depth_merged.bed
-rw-r--r-- 1    367251435 Oct 27 21:31 common_variants_covered_tmp.vcf
-rw-r--r-- 1    367258040 Oct 27 21:31 common_variants_covered.vcf
-rw-r--r-- 1          135 Oct 27 21:31 variants.done
-rw-r--r-- 1            0 Oct 27 21:31 vartrix.out
-rw-r--r-- 1          605 Oct 27 22:37 vartrix.err

Here are the contest of each of the error files.

[bam_sort_core] merging from 9 files and 1 in-memory blocks...
[bam_sort_core] merging from 16 files and 1 in-memory blocks...
[bam_sort_core] merging from 17 files and 1 in-memory blocks...
[bam_sort_core] merging from 20 files and 1 in-memory blocks...
[bam_sort_core] merging from 21 files and 1 in-memory blocks...
[bam_sort_core] merging from 21 files and 1 in-memory blocks...
[bam_sort_core] merging from 21 files and 1 in-memory blocks...
[bam_sort_core] merging from 21 files and 1 in-memory blocks...
[bam_sort_core] merging from 21 files and 1 in-memory blocks...
[bam_sort_core] merging from 21 files and 1 in-memory blocks...
[bam_sort_core] merging from 21 files and 1 in-memory blocks...
[bam_sort_core] merging from 21 files and 1 in-memory blocks...
[bam_sort_core] merging from 23 files and 1 in-memory blocks...
[bam_sort_core] merging from 23 files and 1 in-memory blocks...
[bam_sort_core] merging from 24 files and 1 in-memory blocks...
[bam_sort_core] merging from 33 files and 1 in-memory blocks...

vartrix had a problem and crashed. To help us diagnose the problem you can send us a crash report.

We have generated a report file at "/tmp/report-4980e9d6-bfc5-407e-9cec-3ba62c19145b.toml". Submit an issue or email with the subject of "vartrix Crash Report" and include the report as an attachment.

We take privacy seriously, and do not perform any automated error collection. In order to improve the software, we rely on people to submit reports.

Thank you kindly!


The freebayes vcf looks normal to me and has 1,181,356 variants. Here's the top and bottom of the file:

fileformat=VCFv4.1

FILTER=

filedate=2022.8.29

source=Minimac4.v1.0.2

INFO=

INFO=

INFO=

INFO=

INFO=

INFO=

INFO=

FORMAT=

FORMAT=

FORMAT=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

contig=

INFO=

INFO=

CHROM POS ID REF ALT QUAL FILTER INFO FORMAT MP11 MP12 MP13 MP14 MP15 MP16 MP17 MP18 MP19 MP20 MP21

chr1 788439 1:788439:T:A T A . PASS AF=0.07287;MAF=0.07287;R2=0.47297;IMPUTED;AC=2;AN=22 GT:DS:GP 0|0:0.038:0.962,0.037,0 0|0:0.182:0.823,0.171,0.005 1|0:0.893:0.107,0.893,0 0|0:0.059:0.942,0.057,0.001 0|0:0.072:0.93,0.069,0.001 0|0:0.173:0.834,0.158,0.007 0|1:0.68:0.383,0.553,0.064 0|0:0.055:0.945,0.054,0.001 0|0:0.058:0.943,0.057,0.001 0|0:0.059:0.942,0.057,0.001 0|0:0.058:0.943,0.056,0.001 chr1 791101 1:791101:T:G T G . PASS AF=0.83234;MAF=0.16766;R2=0.41176;IMPUTED;AC=19;AN=22 GT:DS:GP 1|1:1.676:0.026,0.272,0.702 1|1:1.483:0.035,0.448,0.517 1|0:0.964:0.036,0.964,0 1|1:1.841:0.006,0.146,0.848 1|1:1.83:0.007,0.156,0.837 1|0:1.227:0.098,0.578,0.325 1|0:1.139:0.126,0.609,0.2651|1:1.852:0.005,0.137,0.857 1|1:1.843:0.006,0.145,0.849 1|1:1.85:0.006,0.139,0.855 1|1:1.844:0.006,0.144,0.85

...

chr9 138122079 9:138122079:C:T C T . PASS AF=0.80279;MAF=0.19721;R2=0.86081;IMPUTED;AC=13;AN=22 GT:DS:GP 0|1:0.978:0.022,0.978,0 1|1:1.92:0,0.079,0.921 0|1:1.021:0.001,0.977,0.022 0|1:0.979:0.022,0.977,0.001 0|1:0.969:0.031,0.969,0 0|1:1.018:0.002,0.979,0.019 0|1:0.986:0.029,0.956,0.015 0|1:0.975:0.025,0.975,0 1|1:1.862:0.004,0.129,0.866 0|1:0.994:0.011,0.985,0.005 0|1:0.988:0.012,0.988,0 chr9 138123517 9:138123517:C:T C T . PASS AF=0.36297;MAF=0.36297;R2=0.85195;IMPUTED;AC=7;AN=22 GT:DS:GP 0|1:0.985:0.015,0.985,0 0|0:0.25:0.751,0.248,0.001 0|1:0.983:0.017,0.982,0 0|0:0.005:0.995,0.005,0 0|1:0.98:0.02,0.98,0 0|1:0.997:0.01,0.982,0.007 0|1:0.976:0.024,0.976,0 0|0:0.003:0.997,0.003,0 0|1:1.155:0.044,0.758,0.198 0|1:0.981:0.019,0.98,0 0|0:0.003:0.997,0.003,0



Let me know if you see something that we're missing or if there are additional details we can provide to help identify the issue.
wheaton5 commented 1 year ago

What is the deal with the multisample vcf? Freebayes is run in a mode which is unknown mixed samples and outputs a single sample vcf i thought

wheaton5 commented 1 year ago

Are u using known_genotypes? Can you post your command line arguments?

drneavin commented 1 year ago

We are not using known_genotypes but we are using common_variants and the vcf we're using is a vcf that has the variants for the individuals in the pool. This is typically how we run souporcell so I don't think that is likely to be causing the error. Here's the command being run:

souporcell_pipeline.py \
-i $BAM \
-b $BARCODES \
-f $FASTA \
-t $THREADS \
-o $SOUPORCELL_OUTDIR \
-k $N \
--common_variants $VCF
wheaton5 commented 1 year ago

You could try running vartrix manually with the latest vartrix? I made a new singularity build recently to include hisat2 which gives better alignments for variant calling and i could update vartix as well if that fixes things.

drneavin commented 1 year ago

We're trying that now - will let you know how it goes.

I hadn't seen that you had made a new singularity build. We'll take a look and see if the updated version helps sort things out

wheaton5 commented 1 year ago

Its not up yet. Im testing it now.

LorenzoMerotto commented 1 year ago

@drneavin I solved it by running the analysis through conda. I created a new env and installed the required dependencies

drneavin commented 1 year ago

Great, thanks both! I can confirm that the issue was resolved with the newest version of vartrix. @wheaton5, might be good to update it in the new image as well.

wheaton5 commented 1 year ago

thanks, i will update it in the new singularity build

changostraw commented 1 year ago

I am also having this issue. I also posted in the demuxafy board as I am using the Demuxafy singularity image to run souporcell. Have the images been updated since this discussion? Or should I also run vartrix separtely ? Thanks!

Angel-Wei commented 8 months ago

Hi @drneavin , may I ask how the assignment of clusters to individuals in the pool is usually done in your case following this command? I'm a bit confused by the VCF files given to known_genotypes and common_variants. On some of my pooled samples, my initial attempts including known_genotypes and known_genotypes_sample_names couldn't complete and stalled at the clustering. I'd like to give it a try using the command option you recommended if it works well. Thank you so much!

We are not using known_genotypes but we are using common_variants and the vcf we're using is a vcf that has the variants for the individuals in the pool. This is typically how we run souporcell so I don't think that is likely to be causing the error. Here's the command being run:

souporcell_pipeline.py \
-i $BAM \
-b $BARCODES \
-f $FASTA \
-t $THREADS \
-o $SOUPORCELL_OUTDIR \
-k $N \
--common_variants $VCF
drneavin commented 8 months ago

Hi @Angel-Wei, I have a put together some wrappers for demultiplexing and doublet detecting methods in Demuxafy. The script I think you're looking for will correlate the genotypes for the vcf output by souporcell compared to the your vcf after running souporcell which you can find here. Or if you just want to run the script without downloading the Demuxafy singularity image, you can find that script here.

If you have any followup questions about Demuxafy or this script, it would probably be best to open an issue here.

Angel-Wei commented 8 months ago

Hi @drneavin ! Thank you so much for the quick response! Yes, I was also looking at Demuxafy as well and the documentation was really clear to follow. I guess it was my misunderstanding that I thought there was another pipeline I wasn't aware other than Demuxafy. I can surely proceed with that. Thank you so much!

Angel-Wei commented 8 months ago

Hi @drneavin ! Sorry to bug you again. But if you don't mind, can I ask one more question? I wonder is there supposed to be any difference between using common_variants or not using when running the pipeline in a genotype-free manner (like not using known_genotypes and known_genotypes_sample_names)? My attempt on this recommended command hasn't been completed, but I assume including common_variants will output a common_variants_covered.vcf file compared to not including it? Thank you so much!