jamietre / CsQuery

CsQuery is a complete CSS selector engine, HTML parser, and jQuery port for C# and .NET 4.
Other
1.16k stars 250 forks source link

Razor syntax #2

Closed joecoolish closed 12 years ago

joecoolish commented 12 years ago

I'm working with some Razor files (.cshtml) which are part of the MVC framework, and I'm noticing that if I have text like this:

<div class="MainContainer">
    <h2 class="Visualization">
        Concepts</h2>
    <div id="ConceptsStage">
        <img src="@Url.Content("~/Content/Images/1.gif")" style="margin: auto;" />
    </div>
</div>

or this:

@model IEnumerable<SocialMediaEntities.Concept>
@foreach (var item in Model)
{
    <a style="float: left; padding: 5px;" href="javascript:LoadPins('@item.Display')">
        @Html.DisplayFor(modelItem => item.Display)
    </a>
}
<div id="Temp" class="clear">
</div>

the documents get corrupted. Is there any way to extend Razor support?

jamietre commented 12 years ago

It's really only intended to support valid HTML syntax - while it would certainly be possible to extend CsQuery to basically pass through the razor syntax, it is kind of mixing apples and oranges. (The first example, by the way, will work if you use single-quotes to bound the attribute.. not that this helps in the big picture but thought I would mention it :)

<img src='@Url.Content("~/Content/Images/1.gif")' style="margin: auto;" />

I am not sure what the context is, are you preprocessing the files for some reason? If you are using it in real time, what I normally do with MVC is intercept the HTML stream after the razor engine is done with it. There are a number of ways you could do this, e.g. with a custom view/view engine. Simple example of doing this:

http://stackoverflow.com/questions/8642148/how-to-intercept-view-rendering-to-add-html-js-on-all-partial-views

Example of inserting CsQuery in in the middle of the output process:

// views can override this method to be notified that the output has been parsed into "Dom" 
// and can now be manipulated
protected virtual void OnCsQueryRender() {

}

protected CQ Dom {get;set;}
protected override void RenderView(ViewContext viewContext, TextWriter writer, object instance
{
    var sb = new StringBuilder();
    var sw = new StringWriter(sb);

    // render the view into out local writer
    base.RenderView(viewContext, sw, instance);
    Dom  = CQ.Create(sb.ToString());

   // notify client code that "Dom" has been created & it can do stuff
   OnCsQueryRender();

   // write it to the output stream
   writer.Write(Dom.Render());
}

If that is not what you need to do -- and you really need to parse razor formatted files BEFORE the engine does it's thing - I can see what would be involved in updating the parser with that option. It's probably not completely simple though, since as it is now, everything is pure HTML, and bounded by carat tags. Even though I just need to ignore it, this would change input handling significantly since the parser has to be able to recognize when a razor syntax block has been closed, meaning I have to actually parse it at some level.

joecoolish commented 12 years ago

Thank you for your response!

My usage is actually for a "Code Cleanup" utility that I'm developing. I have several MVC projects that I want to clean up, mainly move all inline style markup to a CSS class as well as make any bind-* attributes be added dynamically by jQuery.

http://stackoverflow.com/questions/10558483/automate-text-replace-in-visual-studio

That is a better explaination of what I'm trying to do.

I figured that I could either make a ton of Regex's, or utilize something that handles the actual DOM.

As for extending the existing functionality, I am noticing that it is the DomElementFactory that handles the actual Html string parsing, is that correct? I haven't had time to really go through the code too much, but if you think it won't be too hard for me to extend the existing tokenizer to accept Razor syntax, I might go ahead and do it.

I figure I have about 2-3 hours of replacing text, so if it takes me 4-6 hours to develop this utility, it will be worth it. Especially since I'll be able to reuse it on any other new project I'm assigned to.

If you think the value of me adding Razor to the project isn't going to bring me much in terms of time saving, I'll start down the slippery slope of Regex.

Either way, thank you for your work. It is great to see a non-regex dom parser!

jamietre commented 12 years ago

It does seem like it would be a nice way to do it... trying to parse HTML of any substance with regex usually goes bad pretty quickly ;)

I think this would be a useful tool but in thinking about it a little more I bet it's not that trivial. You really would have to parse the razor syntax fully because you need to be able to identify not just razor, but more HTML inside a razor structure... and so on.

Unfortunately the html parser is the oldest part of this project and not the greatest code in the world; it's never really had much attention since day 1. It's pretty hard to understand, I mean, even for me, and I wrote it.I think if I were going to tackle this directly, I would start by rewriting the parser to use a lexical scanner instead of hardcoded rules that are designed only around HTML tags as it is now. There are already a lot of logic branches and I think it would get ugly really fast to add in a new set of rules that aren't based on the HTML tag structure.

I think a better option would be to take a two step approach, you could write a separate parser to replace all the razor code with placeholders that the HTML parser will just ignore, e.g.

and then replace it again afterwards, which would be pretty easy with csquery. You would still have to essentially write a razor parser, but at least you would not have to integrate it with the existing HTML parser.

Whether you think that's worth the trouble, I can't say...

jamietre commented 12 years ago

Closing this - I don't think there's much to do at this point.

I did notice that HtmlAgilityPack has code that is designed to parse a number of different HTML formats, it would certainly be possible to implement their parser to build a tree using CsQuery's DOM instead of XML nodes. It should be really easy to add an alternate parser into CsQuery since the DomElementFactory class just returns a new DOM tree.