Closed bambams closed 9 years ago
Hello
I was sure that there is no problem in this pull-req :)
I will be happy to merge.
Thanks.
Thanks. I will try to finalize something and submit an official pull request when I'm ready. Or you can feel free to merge anytime you wish and I'll work around the upstream history.
I am also considering replacing the regex <input>
injection with a proper HTML parser. In theory, one of the <form>
attributes could contain >
and break the regex... I just am not familiar with any CPAN HTML parsers and I'm having trouble finding an appropriate one to use.
Is not recommended because there is influence on Performance Using HTML parser. I will fix if you give me fail to the test case.
Yes, performance was a consideration of mine as well. In my experience, parsing SGML based dialects is faster than you'd imagine. I have implemented Web applications that parse complex, user uploaded HTML data (albeit, with .NET) and the performance hit is negligible. I'd imagine that would be true in this case as well.
My concern is that it isn't possible to parse all possible HTML markup with just a regular expression. It will come down to whether it has to always work or whether performance matters more than working every time.
I will try to put together some test cases that break the regular expression and we can go from there. :)
If you're sure you want to keep the regular expression parsing I was thinking that perhaps we could add a configurable branch on the HTML transformation to choose the desired implementation. Depending on how popular the module is, we can either leave the legacy behavior as default and require a constructor argument to enable a proper HTML parser for users that need or want it; or visa-versa. E.g.,
Plack::Session::State::URI->new(session_key => 'sid', html_parser => 1);
or
Plack::Session::State::URI->new(session_key => 'sid', html_regex => 1);
That way the user can do what is necessary, and we can choose an appropriate default for the users that don't care. The performance hit will be optional. :) I'll try to figure out the tests and create some nasty HTML samples. :)
html_parser => 1
It is a good idea!!
I am using HTML::StickyQuery(HTML::Parser) for <a href="...">.
https://github.com/s-aska/p5-Plack-Session-State-URI/blob/master/lib/Plack/Session/State/URI.pm#L46
It might be best to together <form> and <a>.
Performance than now to improve :)
I see that HTML::StickyQuery
is meant to add parameters to all <a>
tag hyperlinks, and it is based on HTML::Parser
. However, it appears that HTML::StickyQuery
doesn't provide any hooks into the parsing process so that we could add <form>
too. It seems that HTML::FillInForm
might do exactly what we need for the form, and it too uses HTML::Parser
. We just have to figure out how to use both without having to parse twice. Perhaps it will require sending patches upstream to these two other modules. Studying the code of each, it looks to me like the base HTML::Parser
just calls a given method for start tags, end tags, comments, text, etc., and these process the markup data and concatenate to the $self->{output}
string.
Firstly, it sounds very inefficient to concatenate onto the output string that many times (it would probably create a ton of temporary strings, whereas a smarter, lower-level interface to Perl strings could concatenate them all at once). That seems like it might be a design flaw of HTML::Parser
, but I'm not familiar with the alternatives in Perl (e.g., in .NET you have a StringBuilder
class).
Secondly, this makes it difficult to combine both effects without parsing twice. Instead of manipulating the tag text through an API, allowing multiple processes to have their turn, they just assume that they own the entire parser and modify the output string. Perhaps it would work though if we implemented a wrapper that checked the tag name and dispatched to each type of parser's methods... E.g., (this is a big hack just to demonstrate a POSSIBLE idea)
package HTML::StickyParams;
use parent qw/HTML::Parser/;
use HTML::FillInForm;
use HTML::StickyQuery;
sub new {
my ($class) = @_;
my $fif = HTML::FillInForm->new( ... );
my $stq = HTML::StickyQuery->new( ... );
my $self = bless { %$fif, %$stq }, $class;
$self->{fif_} = bless $self, ref $fif;
$self->{stq_} = bless $self, ref $stq;
return $self;
}
sub fif_ { shift->{fif_} }
sub stq_ { shift->{stq_} }
sub start {
my ($self, $tagname, $attr, $attrseq, $origtext) = @_;
my $child = do {
if($tag_name eq 'form') {
$self->fif_;
} elsif ($tag_name eq 'a') {
$self->stq_;
}
} or return;
$child->start(@_);
}
In theory something like that could work, but it would be a huge hack that could break when the internals of either package changed, and it would also depend on merging the state of HTML::FillInForm
and HTML::StickyQuery
into our own HTML::Parser
subclass and hoping that nothing collides...
A cleaner approach might be to patch both packages, split their methods (e.g., start
) into two: one that describes the work to do, and another that actually does it, preferrably through a standard method that translates some kind of object representation into the actual changes. But that's quite a lot of effort for this little bit. It might be easier to just copy both modules and merge them ourselves into a new package that doesn't depend on either. Then we'd control the code and they couldn't break it on us. The downside is duplicated code...
I hope that there's a much easier way that I'm missing...
I think that it is good that approach.
Hello,
As mentioned in the other issue I have been assigned this CPAN module as a pull request challenge assignment for January. I have begun attempting to tidy things up where I can. The work is currently a work in progress in my
wip/tidy
branch, which I am overwriting regularly (push -f).A snapshot has been pushed to
preview/tidy/1
(https://github.com/bambams/p5-Plack-Session-State-URI/tree/preview/tidy/1) which I will leave intact for your review (it is static). Please review it and let me know if there are any changes you like, changes you hate, and if there's some common ground to be met. I'd really appreciate you helping to steer me in the correct direction so my time isn't wasted and neither is yours. If there are things you want, but others you don't, let me know and I'll try to do the work of rebasing the branch until you're happy with it. Thanks.