Closed an1lam closed 4 years ago
Don't merge yet, have a better fix actually.
@Avsecz there's one more change that I think it makes sense to make: add force=True
to the call to the tabix()
function here so that running MutationMap.query_bed
twice with the same bed file doesn't cause the following error:
OSError: Filename '/home/stephenmalina/project/dat/10_random_tf_peaks.bed.gz' already exists, use *force* to overwrite
Would you prefer a separate PR for this or should I add it here?
It's a tiny change on the same overall code path so I'm inclined to add it here, but I know that's also not ideal Git PR hygiene...
That's fine. You can include that fix in the PR here. Just make sure you add a high-level description of all the changes in the PR at the top
Cool, will do shortly.
Alright, added the final change and updated the description (first comment) to describe all three changes. Let me know if there's anything else you need from me.
Also, I've been informally testing this by pulling in my changes and using the get_model
and mutation map workflow for something I'm working on. Anything else I can/should do to test? It seems like CI is failing on master, so presumably that's not enough.
I think the tests on the master have some issues with the dependencies. Here is my attempt to fix this: https://github.com/kipoi/kipoi-veff/pull/37
@Avsecz :clap:, success! Thank you!
Released to Pypi v0.3.1: https://pypi.org/project/kipoi-veff/0.3.1/.
Thanks for the PR!
This includes three sub-changes to fix
query_bed
.Make it so that null
bed_input
attrs don't trigger exceptionsSince most models in the Kipoi model zoo seem to not have
bed_input
set (with the exception of two, DeepCPG and one other), it seems wrong for MutationMap to fail on initialization when this value is unset in the config file.Change
_get_dl_bed_fields
to returnNone
when adataloader
'spostprocessing.variant_effects
attribute doesn't include abed_input
field. This will make it so that models like "DeepSEA/variantEffects" can be used withMutationMap
s.The
ModelInfoExtractor
class currently causesMutationMap
's__init__
method to throw an exception if you try to create aMutationMap
instance for a model (e.g. "DeepSEA/variantEffects", see here: https://github.com/kipoi/models/blob/master/DeepSEA/variantEffects/model.yaml) with nobed_input
attribute under 'postprocessing > variant_effects' in its YAML file. The rest of the subsequent code checks whetherexec_files_bed_keys
isNone
and, in general, seems to be able to handle this case, so presumably this failure is unintentional.One can reproduce this issue with the following code snippet (assumes relevant packages have been installed):
Only return non-null
bed3_to_vcf_index
when the temp file is non-emptyThis code currently returns a temp file pointer for
bed3_to_vcf_index
even in cases where the file's empty and therefore invalid. This causes problems because, as the following code snippet shows, Python will throw an exception when it tries to read an empty tempfile.On the other hand, as one might expect, the following code runs successfully.
Enable
query_bed
to be able to be run twice on the same fileCurrently, if you try to run
query_bed
twice with the same file, it will fail with an error that looks like the following.We want to overwrite the file it's failing for because the overwrite is non-destructive (it just optimizes the compression) and, assuming the original .bed file is uncompressed, will only overwrite a file previously created by a call to
tabix
.