wheaton5 / souporcell

Clustering scRNAseq by genotypes
MIT License
168 stars 46 forks source link

souporcells runs without common variants but when I add the VCF file it breaks #221

Open Ahmedalaraby20 opened 8 months ago

Ahmedalaraby20 commented 8 months ago

So I ran the souporcell without any common variants and inconsistent results when compared to the hashage information with cells with one hash being split into 3 cluster (attached) souporcell output https://drive.google.com/file/d/1JGWAGvx33oPItTh-d680kKlsrete2tdY/view?usp=sharing

souporcellvsH Next, I tried running souporcell with common varaints and skip remap and this is what I get

checking modules
imports done
checking bam for expected tags
checking fasta
restarting pipeline in existing directory Ahmed_1GEXsnp
using common variants
8
***** WARNING: File Ahmed_1GEXsnp/depth_merged.bed has inconsistent naming convention for record:
KI270727.1  20503   20713

***** WARNING: File Ahmed_1GEXsnp/depth_merged.bed has inconsistent naming convention for record:
KI270727.1  20503   20713

running vartrix
running souporcell clustering
/opt/souporcell/souporcell/target/release/souporcell -k 8 -a Ahmed_1GEXsnp/alt.mtx -r Ahmed_1GEXsnp/ref.mtx --restarts 100 -b /mnt/ravens/SequencingRuns/Tao/Ahmed1/sample_filtered_feature_bc_matrix/barcodes.tsv --min_ref 10 --min_alt 10 --threads 8
running souporcell doublet detection
Traceback (most recent call last):
  File "/opt/souporcell/souporcell_pipeline.py", line 596, in <module>
    doublets(args, ref_mtx, alt_mtx, cluster_file)
  File "/opt/souporcell/souporcell_pipeline.py", line 541, in doublets
    subprocess.check_call([directory+"/troublet/target/release/troublet", "--alts", alt_mtx, "--refs", ref_mtx, "--clusters", cluster_file], stdout = dub, stderr = err)
  File "/usr/local/envs/py36/lib/python3.6/subprocess.py", line 311, in check_call
    raise CalledProcessError(retcode, cmd)
subprocess.CalledProcessError: Command '['/opt/souporcell/troublet/target/release/troublet', '--alts', 'Ahmed_1GEXsnp/alt.mtx', '--refs', 'Ahmed_1GEXsnp/ref.mtx', '--clusters', 'Ahmed_1GEXsnp/clusters_tmp.tsv']' returned non-zero exit status 101.

this is the output I get souporcellouts.zip

The VCF file from the totourial did not work with me (download an empty file - attacehd) common_variants_grch38.zip So i used the file from https://www.dropbox.com/s/4nmm344g4j7pou4/GRCh38_1000G_MAF0.01_GeneFiltered_NoChr.vcf?e=4

I have around 6000 cells from 8 donors