Closed matthdsm closed 7 years ago
Also, Would it be possible to build in a catch for the warning thrown at https://github.com/quinlan-lab/vcf2db/blob/master/vcf2db.py#L409 ? This results in a KeyError at line 803 and causes the loading to stop.
what constitutes and "extra" info field? You can make this functionality your self and send the complement to vcf2db.
Re the warning, that means you requested an impacts-extra that wasn't in the VCF. Perhaps I should just raise an error there or, I could subset impacts_extras to those seen in the VCF, but why would you send fields that are not in the VCF?
Hi Brent,
with an "extra" info field I mean the fields outside the CSQ tag. For example we, we annotate with every column from dbNSFP, which makes for a lot of extra arguments. It would be more user friendly if a all
option was available.
This leads to my second point about the error. I would be nice if this would be caught without interrupting to loading. No one would willingly add a field that isn't there, but typo's can always happen. I think the warning would suffice, without stopping the process.
Just to be clear, I really appreciate the work you've done on this. Any remaining issue is really minor, but would be nice to have.
M
no problem. I want to make this useful.
can you just clarify why you would see the warning from line 409? The only case I can think of is when you have mis-specified the impacts-extras columns.
Hi Brent, That was exactly the case. When you're dealing with 100+ extra column names, a typo is bound to sneak in somewhere, which is interpreted as a non existent info tag, resulting in the warning above. As I said, this can easily be fixed on our side by just re running the script, but it would be nice if we could just have vcf2db throw a warning and then proceed. In our case, we have 48 simultaneous instances running on torque, so it's kind of annoying to check every log to see if the script exited succesfully.
M
I just pushed a change for this. I'm thinking about erroring if you specify a column that isn't in the header so it doesn't get propagated all the way through loading and into the table when it doesn't exists. comments?
Hi Brent, would it be possible to throw an error, but still go through with loading, while skipping the column name that isn't there? Or does this make no sense to you? I know it sound kind of counter intuitive to "allow" an error to happen, but it'd be nice when running many simultaneous runs.
Thanks M
That's how it is now but it may change in the future to be stricter. Can you give it a try?
Hi Brent,
I'll give it shot tomorrow! Thanks!
M
Hi Brent,
Seems to work beautifully. Great work! Thanks!
ok. let me know if you hit any other problems.
Hi Brent,
I'd like to request some more changes regarding chapmanb/bcbio-nextgen#1832.
First of all, would it be possible to add an
all
option to include all extra info fields into the impacts table? Or some sort of reverse grep to include all<INFO>
fields except the ones specifiedSecond, would it be possible to infer the correct column type from the info fields (assuming the vcf file was annotated correctly)? It's great to be able to override the default, be it would be even more awesome if this could be done automatically.Edit: apparently my second comment is already taken into account, great!
Thanks M