quinlan-lab / vcf2db

create a gemini-compatible database from a VCF
MIT License
55 stars 13 forks source link

expand functionality of "--impacts-field" argument #30

Closed matthdsm closed 7 years ago

matthdsm commented 7 years ago

Hi Brent,

I'd like to request some more changes regarding chapmanb/bcbio-nextgen#1832.

Edit: apparently my second comment is already taken into account, great!

Thanks M

matthdsm commented 7 years ago

Also, Would it be possible to build in a catch for the warning thrown at https://github.com/quinlan-lab/vcf2db/blob/master/vcf2db.py#L409 ? This results in a KeyError at line 803 and causes the loading to stop.

brentp commented 7 years ago

what constitutes and "extra" info field? You can make this functionality your self and send the complement to vcf2db.

Re the warning, that means you requested an impacts-extra that wasn't in the VCF. Perhaps I should just raise an error there or, I could subset impacts_extras to those seen in the VCF, but why would you send fields that are not in the VCF?

matthdsm commented 7 years ago

Hi Brent, with an "extra" info field I mean the fields outside the CSQ tag. For example we, we annotate with every column from dbNSFP, which makes for a lot of extra arguments. It would be more user friendly if a all option was available. This leads to my second point about the error. I would be nice if this would be caught without interrupting to loading. No one would willingly add a field that isn't there, but typo's can always happen. I think the warning would suffice, without stopping the process.

matthdsm commented 7 years ago

Just to be clear, I really appreciate the work you've done on this. Any remaining issue is really minor, but would be nice to have.

M

brentp commented 7 years ago

no problem. I want to make this useful.

can you just clarify why you would see the warning from line 409? The only case I can think of is when you have mis-specified the impacts-extras columns.

matthdsm commented 7 years ago

Hi Brent, That was exactly the case. When you're dealing with 100+ extra column names, a typo is bound to sneak in somewhere, which is interpreted as a non existent info tag, resulting in the warning above. As I said, this can easily be fixed on our side by just re running the script, but it would be nice if we could just have vcf2db throw a warning and then proceed. In our case, we have 48 simultaneous instances running on torque, so it's kind of annoying to check every log to see if the script exited succesfully.

M

brentp commented 7 years ago

I just pushed a change for this. I'm thinking about erroring if you specify a column that isn't in the header so it doesn't get propagated all the way through loading and into the table when it doesn't exists. comments?

matthdsm commented 7 years ago

Hi Brent, would it be possible to throw an error, but still go through with loading, while skipping the column name that isn't there? Or does this make no sense to you? I know it sound kind of counter intuitive to "allow" an error to happen, but it'd be nice when running many simultaneous runs.

Thanks M

brentp commented 7 years ago

That's how it is now but it may change in the future to be stricter. Can you give it a try?

matthdsm commented 7 years ago

Hi Brent,

I'll give it shot tomorrow! Thanks!

M

matthdsm commented 7 years ago

Hi Brent,

Seems to work beautifully. Great work! Thanks!

brentp commented 7 years ago

ok. let me know if you hit any other problems.