markschl / seqtool

Fast and flexible tool for reading, modifying and writing biological sequences
MIT License
17 stars 1 forks source link

Improve variables and formulas #4

Closed markschl closed 4 months ago

markschl commented 2 years ago

I'm planning to switch from using "variables with options" (name:option1:option2) to a more familiar function-style name(option1, option2) syntax.

Maybe also introduce object/module-like prefixes, possibly a syntax like seq.count('GC'). This should however integrate well with the formulas (below).

Formulas:: I'd like to switch from ExprTk to a familar scripting language, most likely QuickJS. Initial tests showed that it is sufficiently fast for our purpose and doesn't increase binary size by more than 1 MiB (relative to having no formulas, actually ExprTk uses more space!). In most cases, a fully fledged scripting language may be unnecessary, usually formulas e.g. for filtering sequences are very simple. However, the whole power of Javascript will be available if if necessary (or, possibly with restricted features by turning off some of the functionality). Since it would be possible to return strings, not only booleans or floats (as in ExprTk), the 'split' command will be more powerful. I could even imagine allowing external scripts using something like st filter 'file:filter_expr.js' sequences.fasta. Also, it would be possible to offer a global object for sharing information between different sequence records.

It is still necessary to investigate, how we can integrate this with the 'simple' variable system, especially if using an 'object-like' syntax. It is likely too complicated, if above 'functions' were real Rust functions called from Javascript. I'd rather evaluate them beforehand (before the script is run, thus arguments cannot contain on variables from the script) and modify the script to replace the function calls with similarly named simple variables. E.g. seq.count('GC') becomes seq_count_GC or seq.count_GC (if we decide to define a real seq object in Javascript). Offering a more comprehensive API with all sequence record properties fully accessible in Javascript could be explored for later versions of seqtool. If continuing on this road, it could even make sense to transform the Rust code to a Javascript or Python library, so seqtool would then be a high-level script relying on the Rust library for reading and otherwise handling the records. However, I'm not sure at all if this could be done in any way without too much of a negative impact on speed.

markschl commented 4 months ago

QuickJS has been incorporated instead of ExprTk. The documentation has been updated.