troublet command error - Githubissues

zchatt commented 10 months ago

Thank you for this great tool. I am trialing to determine whether we can pool cell-lines for 10X runs and deconvolute using variants. To test I have a xenograft snRNAseq run (10X v3) that is from a single-cell line which I would like to detect using the .vcf from WES of the cell-line. The human population of this xenograft is small and there are only 212 barcodes wihtin my filtered_feature_bc_matrix/barcodes.tsv

I have run (singularity) without specifying "--known_genotypes" or "--known_genotypes_sample_names" and the pipeline runs to completion.

However, when I specify "--known_genotypes" and "--known_genotypes_sample_names" (as below), the pipeline fails. The error message is from the troublets command but I believe something is happening at the vartix step.

There are 4203 variants within my common_variants_covered.vcf. My "--known_genotypes" vcf (WES) has 93219 variants, but the alt.mtx and ref.mtx look strange i.e. dont start from 1 and only have 89 lines.

Scratching my head how to troubleshoot. Any insights would be greatly appreciated.

`# Run script

soup_or_cell_sif="/home/zac/souporcell/souporcell_latest.sif" num_clusters=4 num_threads_to_use=24 output_dir_name="/data/zac/asap_snrnaseq/souporcell/run_HJ377DRX2_12w_RM35vcf4" reference_fasta="/data/zac/asap_snrnaseq/reference/hg38.fa" barcodes_tsv="/data/zac/asap_snrnaseq/snrnaseq/run_HJ377DRX2_12w/outs/filtered_feature_bc_matrix/barcodes.tsv" bam_file="/data/zac/asap_snrnaseq/snrnaseq/run_HJ377DRX2_12w/outs/possorted_genome_bam.bam" vcf="/data/zac/asap_snrnaseq/souporcell/merged_ppmi.feb.1.2015_RM35.vcf" export SINGULARITY_BIND="/data/zac/"

mkdir -p $output_dir_name cd $output_dir_name

singularity exec $soup_or_cell_sif souporcell_pipeline.py \ -i $bam_file \ -b $barcodes_tsv \ -f $reference_fasta \ -t $num_threads_to_use \ -o $output_dir_name \ -k $num_clusters \ --known_genotypes $vcf \ --known_genotypes_sample_names 7_21_RM35 PPMI_SI_3868`

`# Error Message

Traceback (most recent call last): File "/opt/souporcell/souporcell_pipeline.py", line 596, in doublets(args, ref_mtx, alt_mtx, cluster_file) File "/opt/souporcell/souporcell_pipeline.py", line 541, in doublets subprocess.check_call([directory+"/troublet/target/release/troublet", "--alts", alt_mtx, "--refs", ref_mtx, "--clusters", cluster_file], stdout = dub, stderr = err) File "/usr/local/envs/py36/lib/python3.6/subprocess.py", line 311, in check_call raise CalledProcessError(retcode, cmd) subprocess.CalledProcessError: Command '['/opt/souporcell/troublet/target/release/troublet', '--alts', '/data/zac/asap_snrnaseq/souporcell/run_HJ377DRX2_12w_RM35vcf4/alt.mtx', '--refs', '/data/zac/asap_snrnaseq/souporcell/run_HJ377DRX2_12w_RM35vcf4/ref.mtx', '--clusters', '/data/zac/asap_snrnaseq/souporcell/run_HJ377DRX2_12w_RM35vcf4/clusters_tmp.tsv']' returned non-zero exit status 101.`

`head alt.mtx

%%MatrixMarket matrix coordinate real general % written by sprs 4203 212 86 316 5 0 316 10 0 316 17 0 316 24 0 316 33 0 316 36 0 316 43 0

head clusters_tmp.tsv

AAACCCAAGCTGGCCT-1 0 -0.6931472 -0.6931472 AACCAACTCTCTATGT-1 0 -0.6931472 -0.6931472 AACGGGAAGTGATCGG-1 0 -0.6931472 -0.6931472 AACGTCAAGCATTGTC-1 0 -0.6931472 -0.6931472 AAGCCATCAATACGCT-1 0 -0.6931472 -0.6931472 AATGAAGTCGTGGACC-1 0 -0.6931472 -0.6931472 AATTCCTGTTTGACAC-1 0 -0.6931472 -0.6931472 AATTTCCCACGTTGGC-1 0 -0.6931472 -0.6931472 AATTTCCCATTAAAGG-1 0 -0.6931472 -0.6931472 ACAACCACAGGCGATA-1 0 -0.6931472 -0.6931472

head clusters.tsv

barcode status assignment log_prob_singleton log_prob_doublet cluster0 cluster1`