Open bensesbg opened 3 years ago
Did you ever find out what causes this? I'd be very interested to know
I am seeing a very similar error right now:
29689 doublets excluded from genotype and ambient RNA estimation
0 not used for soup calculation due to possible RNA edit
Traceback (most recent call last):
File "/opt/souporcell/consensus.py", line 348, in <module>
fit = sm.optimizing(data=counts_dat)
File "/opt/conda/lib/python3.6/site-packages/pystan/model.py", line 542, in optimizing
fit = self.fit_class(data, seed)
File "stanfit4anon_model_c58d6755a445ee1723e096eb7e36ea75_355834653533342947.pyx", line 459, in stanfit4anon_model_c58d6755a445ee1723e096eb7e36ea75_355834653533342947.StanFit4Model.__cinit__
RuntimeError: Exception: int variable contained non-int values; processing stage=data initialization; variable name=cluster_allele_counts; base type=int (in 'unknown file name' at line 8)
Do we need to tell PyStan that the cluster_allele_counts
variable contains integers?
Pystan version changes things and pystan version is also sensitive to python version. If you use my conda environment i think this should go away.
I see exactly the same issue, running souporcell in the singularity container that I downloaded a couple weeks ago. Several samples have worked fine, now this error:
169910 excluded for potential RNA editing
5971 doublets excluded from genotype and ambient RNA estimation
0 not used for soup calculation due to possible RNA edit
Traceback (most recent call last):
File "/opt/souporcell/consensus.py", line 348, in <module>
fit = sm.optimizing(data=counts_dat)
File "/usr/local/envs/py36/lib/python3.6/site-packages/pystan/model.py", line 472, in optimizing
fit = self.fit_class(data, seed)
File "stanfit4anon_model_c58d6755a445ee1723e096eb7e36ea75_355834653533342947.pyx", line 459, in stanfit4anon_model_c58d6755a445ee1723e096eb7e36ea75_355834653533342947.StanFit4Model.__cinit__
RuntimeError: Exception: int variable contained non-int values; processing stage=data initialization; variable name=cluster_allele_counts_soup; base type=int (in 'unknown file name' at line 9)
I see exactly the same issue, running souporcell in the singularity container that I downloaded a couple weeks ago. Several samples have worked fine, now this error:
169910 excluded for potential RNA editing
5971 doublets excluded from genotype and ambient RNA estimation
0 not used for soup calculation due to possible RNA edit
Traceback (most recent call last):
File "/opt/souporcell/consensus.py", line 348, in <module>
fit = sm.optimizing(data=counts_dat)
File "/usr/local/envs/py36/lib/python3.6/site-packages/pystan/model.py", line 472, in optimizing
fit = self.fit_class(data, seed)
File "stanfit4anon_model_c58d6755a445ee1723e096eb7e36ea75_355834653533342947.pyx", line 459, in stanfit4anon_model_c58d6755a445ee1723e096eb7e36ea75_355834653533342947.StanFit4Model.__cinit__
RuntimeError: Exception: int variable contained non-int values; processing stage=data initialization; variable name=cluster_allele_counts_soup; base type=int (in 'unknown file name' at line 9)
I notice in the 'clusters.tsv' file for this sample that basically all cells are 'unassigned':
Count Assignment
3 doublet 0/1
1 singlet 0
1 status assignment
5739 unassigned 0
104 unassigned 0/1
86 unassigned 1
39 unassigned 1/0
Can the absence of a singlet with class=1 be the cause of the error?
Greetings! I was wondering if you might be able to help resolve an issue we are encountering during the consensus.py step which is generating the following output/error:
INFO:pystan:COMPILING THE C++ CODE FOR MODEL anon_model_c58d6755a445ee1723e096eb7e36ea75 NOW. 14884452 excluded for potential RNA editing 25990 doublets excluded from genotype and ambient RNA estimation 0 not used for soup calculation due to possible RNA edit Traceback (most recent call last): File "/opt/souporcell/consensus.py", line 348, in
fit = sm.optimizing(data=counts_dat)
File "/usr/local/lib/python3.8/site-packages/pystan/model.py", line 542, in optimizing
fit = self.fit_class(data, seed)
File "pystan_yvxd5ae2/stanfit4anon_model_c58d6755a445ee1723e096eb7e36ea75_7285297220659911018.pyx", line 479, in stanfit4anon_model_c58d6755a445ee1723e096eb7e36ea75_7285297220659911018.StanFit4Model.cinit
RuntimeError: Exception: int variable contained non-int values; processing stage=data initialization; variable name=cluster_allele_counts_soup; base type=int (in 'unknown file name' at line 9)
Our current workflow calls the compile_stan_model.py and consensus.py steps of the pipeline with the following commands: python3.8 /opt/souporcell/compile_stan_model.py && python3.8 /opt/souporcell/consensus.py -a out_matrix.mtx -c clusters.tsv -r ref_matrix.mtx -v 1000G_acan_hg38_snps_mainchr.vcf --soup_out ambient_rna.txt --vcf_out cluster_genotypes.vcf --output_dir .
This seems to work for the majority of our samples, but there appears to be an edge case that throws this error in a couple of them. Any help you can provide to assist us in determining the cause of this would be highly appreciated.