rasmushenningsson / VariantCallFormat.jl

Read and write VCF and BCF files
Other
13 stars 3 forks source link

Support upcoming Automa.jl release #5

Open rasmushenningsson opened 3 years ago

rasmushenningsson commented 3 years ago

The description below was writen by @jakobnissen. Saving it here for future reference!

For the rewrite, it should be a minor rewrite, with just a few lines changed. I recommend reading this tutorial on Automa and how it works: https://biojulia.net/post/automa1/ Basically, the change in Automa is that the new version will, when you create a Machine, check that the actions can be unambiguously resolved. And they can't be in the current version of VCF

When you try to compile VCF with the latest version of Automa, you get this error ERROR: LoadError: LoadError: Ambiguous DFA: Input 0x2e can lead to actions nothing or [:mark] Stacktrace: what it means is that there is at least one possible input where the byte 0x2e can lead to two different actions, and it's impossible to resolve which. To fix it, go into this file https://github.com/rasmushenningsson/VariantCallFormat.jl/blob/main/src/reader.jl, and try to see which regex pattern contains 0x2e where there may be an ambiguity

Or to put it even more starkly: With the current version of Automa, there is at least one input that will cause the VCF parser to do the wrong thing, silently.

To debug it, it may be useful to first figure out which exact Machine it is that raises the error, then take the regex that produces the machine and convert it to an NFA, then convert that NFA to a dot file and visualize it. Then look for places in the graph where 0x2e leads to two distinct paths (I could do the PR, but I think it's more durable if you learn to debug Automa youself, and I'm happy to help) :)

See also https://github.com/BioJulia/GeneticVariation.jl/issues/28.

rasmushenningsson commented 2 years ago

The ambiguity checks have been disabled again in Automa.jl v0.8.2. However, we should still:

mashu commented 5 months ago

Any update on this ? Any alternatives to this package that work with latest Automa?