Open biocyberman opened 6 years ago
You can absolutely build a range out of the token parser. This is how iopipe's delimRange works: https://github.com/schveiguy/iopipe/blob/f44dcc8c55b89da2f78926c867310883cc5be547/source/iopipe/textpipe.d#L542
release appears at several level BufRef, FastaToken and Result. So what should be done with those release?
So a BufRef
(and all the things that contain them) is a reference into a buffer without knowing the buffer type itself. It's simply the equivalent of a,b
in the expression buffer[a .. b]
. The reason it's used is because I don't want to store references to the actual data, that is not important until you want to use it, and it can easily become invalid.
The reason the release
function exists is when you release data from the actual buffer, those numbers a,b
must change to accommodate the new buffer data.
To be able to filter, the range should be more discrete.
I don't see why. A range only cares about the current element. A simple range on top of the nextToken
function is trivially written:
struct FastaRange(Chain)
{
private Chain chain; // the source parser
private FastaToken tok; // the buffer reference
auto front() { return tok.value(chain.window); }
bool empty() { return tok.endPos == 0; }
void popFront()
{
chain.release(tok.endPos);
tok = chain.nextToken;
}
}
Note that front
is going to return a concrete token, and that allocates (the array of fields). Really, I need to develop an abstraction on an item that contains BufRef
to make it so the allocation isn't happening every time you get the value.
How do I access previous items before
nextToken
?
This requires a different mechanism than a range, because a range is only interested in one element.
Hi Steve
Just to be clear, I was not criticizing the design of iopipe
or fasta.d
.
Regarding the release
functions. My confusion stemmed from the difference between iopipe
's release and fastaq
's release:
while(lines.window.strip.empty)
{
// As in code to skip lines:
lines.release(lines.window.length); // This shifts forward the starting point of reference buffer of window.length
lines.extend(0);
}
Whereas in BufRef
it retracts the ending point of reference buffer:
struct BufRef
{
size_t pos;
size_t length;
auto value(B)(B buf)
{ ... }
void release(size_t elements)
{
pos -= elements; //This retracts the end point of ref buf by length `elements`
}
}
Back to the range: I tested the FastRange
and it failed unittest: https://github.com/bioslaD/fastaq/tree/wiprange
Normally I would go back and read more about range, and do some debugging to find out myself. But I've spent more than one hour checking around. It probably takes you much less than that to get through this hurdle. So I will really appreciate if you could take a look. I am struck.
Further more, this usage of the range requires rather extra typing for its users (if I am doing correctly):
auto tokenizer2 = input.tokenParser;
alias ChainType = typeof(tokenizer2);
auto r = FastaRange!ChainType(tokenizer2, tokenizer2.nextToken);
Anyway we can get this instead? auto fastaRange = input.tokenParser
. This would mean to implement range primitives inside the Result
struct, but will greatly simplify its usage.
Just to be clear, I was not criticizing the design of iopipe or fasta.d.
I want to be clear here, I have no ego when it comes to my code that is broken or inferior. Yes I want it to succeed, but I'm OK if it's sub-optimal and there's something better to replace it! You don't have to worry about criticizing my code, ever :)
In particular I'm not in love with how the fasta parser works, I think there are better more efficient ways to do it. It was a quick-and-dirty version that I threw together essentially in one day. Yes it works, but we can do better.
My confusion stemmed from the difference between iopipe's release and fastaq's release
Yes, it may be better to rename the release
function in BufRef
. In previous implementation of BufRef
(namely in the json parser), I did this manually, and it's much more pleasant to have a function that takes care of it, especially when we have a type with lots of them inside (FastaToken
).
My end goal with this pattern is to formally implement it in iopipe for parsing as some sort of utility -- both the ability to use buffer references, and the ability to update them.
Back to the range: I tested the FastRange and it failed unittest:
My guess is this part of your unittest:
auto r = FastaRange!ChainType(tokenizer2, tokenizer2.nextToken);
Here, you are constructing a FastaRange
with a tokenizer that has NOT YET parsed any tokens, and then telling that tokenizer to parse a token. The problem is that your first parameter makes a copy of tokenizer2
and puts it into the resulting range. This one has an empty window!
Then, when you go to get the window of the empty iopipe inside the FastaRange
(the one that was copied), it fails. See my proposed function below for the solution. It's important not to use a token on a different tokenizer when getting the concrete value.
Further more, this usage of the range requires rather extra typing for its users
Right, almost all ranges (and iopipes for that matter) are accompanied with a range-producing function. In D, there is only one way to get type inference for template parameters, and it's with a function. If you want to use UFCS, you need a global function.
So what you need is a function that produces one of these ranges:
auto fastaRange(Chain)(Chain chain)
{
auto tokenizer = chain.tokenParser;
auto result = FastaRange!(typeof(tokenizer))(tokenizer);
result.popFront(); // prime the range, this properly stores the token and advances the iopipe
return result;
}
//Usage:
auto r = input.fastaRange;
I want to be clear here, I have no ego when it comes to my code that is broken or inferior. Yes I want it to succeed, but I'm OK if it's sub-optimal and there's something better to replace it! You don't have to worry about criticizing my code, ever :)
I like that we set the ground firmly and clearly because things might get misinterpreted via written communication.
Yes, it may be better to rename the release function in BufRef. In previous implementation of BufRef (namely in the json parser), I did this manually, and it's much more pleasant to have a function that takes care of it, especially when we have a type with lots of them inside (FastaToken).
I will go back to this in the future.
Here, you are constructing a FastaRange with a tokenizer that has NOT YET parsed any tokens, and then telling that tokenizer to parse a token. The problem is that your first parameter makes a copy of tokenizer2 and puts it into the resulting range. This one has an empty window!
Spot on! And the fastaRange
function works as intended. I got the new unittest passed.
Steve @schveiguy
I am thinking about implementing
filter
function that allows filtering fasta entries by a regex/literal string. This prompts a need to subset the output oftokenParser
function. Therefore thought about convert fasta.q to a InputRange or RandomAccessRange.nextToken
insidetokenParser
looks very much likefront()
, andrelease
is likepopFront
. Additionally, whenpos == 0
, range'sempty
is true. So it seems things are there already. But I have some doubts:release
appears at several levelBufRef
,FastaToken
andResult
. So what should be done with thoserelease
? It looks a like consistent pattern, not a fix think like the relationship ininterface
vs implementation class.nextToken
. How do I access previous items beforenextToken
? Should there be an array to contain them? I still don't know how to return a subset ofResult
that hold a list of discreteFastaToken
orFastaConcreteToken
.