Closed Michael-Geuenich closed 3 years ago
I'm also interested in the quant
functionality and I have the same problem as @Michael-Geuenich. The extract
and genotype
steps worked fine.
Also just for your information, when I try to run the customize
step, I get the following error:
Traceback (most recent call last):
File "/home/dtang/src/test/arcasHLA-quant/scripts/customize.py", line 43, in <module>
from Bio.Alphabet import generic_dna
File "/home/ngs/Analysis/tools/arcashla_venv/lib/python3.9/site-packages/Bio/Alphabet/__init__.py", line 20, in <module>
raise ImportError(
ImportError: Bio.Alphabet has been removed from Biopython. In many cases, the alphabet can simply be ignored and removed from scripts. In a few cases, you may need to specify the ``molecule_type`` as an annotation on a SeqRecord for your script to work correctly. Please see https://biopython.org/wiki/Alphabet for more information.
I guess I need to use an older version of Biopython.
Hello! Thanks for your interest in our tool.
@Michael-Geuenich you'll want to run the "customize" script before the "quant" script, which will construct the missing reference. Use the genotype.json from the "genotype" step:
arcasHLA customize --transcriptome chr6 --genotype sample.genotype.json -o ref
The README will be updated to make this clearer.
@davetang you're correct, this error is caused by a depreciated object in the latest Biopython release. For a quick fix, you can revert to an older version of Biopython (1.77 or earlier) until we can provide a workaround in an upcoming release.
Hi @tpereachamblee,
thank you for the comment. I downgraded my Biopython version and could run the customize
subcommand. However, I'm getting the following error after running:
arcasHLA quant --ref ref/my_sample -o test my_sample.extracted.1.fq.gz my_sample.extracted.2.fq.gz
Traceback (most recent call last):
File "/home/dtang/Analysis/tools/arcasHLA-quant_down/scripts/quant.py", line 300, in <module>
baf = allele_results[gene]['allele1_count'] / (allele_results[gene]['allele1_count'] + allele_results[gene]['allele2_count'])
ZeroDivisionError: division by zero
May you let me know what the problem could be?
Thanks again, Dave
Hi @tpereachamblee,
thanks for the additional clarification. I've run the command exactly as you specified within the arcasHLA-quant docker container. Unfortunately I am still getting the following error:
Traceback (most recent call last):
File "/home/arcasHLA-quant/scripts/customize.py", line 299, in <module>
build_custom_reference(subject, genotype, args.grouping, args.transcriptome, temp)
File "/home/arcasHLA-quant/scripts/customize.py", line 94, in build_custom_reference
with open('dat/ref/allele_groups.p','rb') as file:
FileNotFoundError: [Errno 2] No such file or directory: 'dat/ref/allele_groups.p'
Any insights on what the issue could be?
Hi @Michael-Geuenich,
I had the same error because I wasn't running arcasHLA
from where I cloned and prepare the repo. I copied the dat
directory from the cloned location into where I was executing arcasHLA
and that circumvented that error. But then I got the divide by zero error, so maybe it's not advisable to copy the folder over. That said, let me know how it works for you!
Cheers, Dave
Thank you both for your helpful feedback. I believe @davetang's solution should work for you @Michael-Geuenich the customize script needs to be updated work from outside of the arcasHLA folder (presently it does not).
@davetang as far as your issue running quant, I am still looking into it and will comment here once I think I know what is going on.
I'm trying to get the allele specific expression of arcasHLA-quant to work. So far I've used arcasHLA to extract all reads that map to chr6, and genotyped my sample using
genotype
(from the original arcasHLA repo).I'm now trying to run the quantification command of arcasHLA-quant. I've tried this by running
arcasHLA quant -o /my_output_folder/ /extracted_fastqs/sample1.extracted.1.fq.gz /extracted_fastqs/sample1.extracted.2.fq.gz
This results in the following error:
Traceback (most recent call last): File "/home/arcasHLA-quant/scripts/quant.py", line 178, in <module> indv_idx = args.ref + '.idx' TypeError: unsupported operand type(s) for +: 'NoneType' and 'str'
Looking at the
quant.py
script, it seems to me that this is because I did not supply a--ref
flag, and the algorithm then tries to doNone + '.idx'
.My first question: what file do I need to provide for the --ref flag and how can I create this?
I'm guessing this should come from the
quant_ref
command referenced in the readme, however, it appears as though this has been renamed tocustomize
? I've tried runningcustomize
instead ofquant_ref
as described in the readme:arcasHLA customize ~/arcas/sample1.genotype.json -o ~/arcas/ref
However, this results in the following error:
usage: arcasHLA customize [options] arcasHLA customize: error: unrecognized arguments: /arcas/sample1.genotype.json
/arcas/sample1.genotype.json
is the output from my genotyping call.Can you advise? In particular, I am confused about whether or not I need a reference and what the purpose of this is? Also, if I have multiple samples, do I need to create one reference per sample or one reference for all samples?