rflechner / ScrapySharp

reborn of https://bitbucket.org/rflechner/scrapysharp
MIT License
346 stars 75 forks source link

Issue parsing empty select #2

Closed balexandrov closed 5 years ago

balexandrov commented 6 years ago

If there is an select input without options (populated later with script) form parser throws exception. PageWebForm.cs ParseFormFields Here I've put some null checks for value.

var selects = from @select in node.CssSelect("select")
                          let name = @select.GetAttributeValue("name")
                          let option =
                              @select.CssSelect("option").FirstOrDefault(o => o.Attributes["selected"] != null) ??
                              @select.CssSelect("option").FirstOrDefault()
                          let value = (option == null) ? null : option.GetAttributeValue("value")
                          select new FormField
                          {
                              Name = name,
                              Value = string.IsNullOrEmpty(value) ? option == null ? "" : option.InnerText : value
                          };
rflechner commented 6 years ago

Could you please provide a HTML sample ?

balexandrov commented 6 years ago

Can't find now the exact page that triggered this but the HTML code was empty select element without options in it. ie <select></select>

rflechner commented 5 years ago

Sorry, I cannot reproduce with a test:

        [Test]
        public void When_parsing_empty_select_tag()
        {
            var doc = new HtmlDocument();
            doc.LoadHtml(@"<html><body><select></select></body></html>");
            var node = doc.DocumentNode;

            var selects = (from @select in node.CssSelect("select")
                let name = @select.GetAttributeValue("name")
                let option =
                    @select.CssSelect("option").FirstOrDefault(o => o.Attributes["selected"] != null) ??
                    @select.CssSelect("option").FirstOrDefault()
                let value = (option == null) ? null : option.GetAttributeValue("value")
                select new 
                {
                    Name = name,
                    Value = string.IsNullOrEmpty(value) ? option == null ? "" : option.InnerText : value
                }).ToArray();

            Assert.AreEqual(1, selects.Length);
        }
balexandrov commented 5 years ago

I've just checked. This code is present in two files PageWebForm.cs and WebForm.cs. I've hit this when loading page, containing such html in PageWebForm but it must be fixed at the other place too. The original code is missing the check for null: "let value = (option == null)" and option is clearly null when there are none of them.

This is the original code:

var selects = from @select in node.CssSelect("select")
                          let name = @select.GetAttributeValue("name")
                          let option =
                              @select.CssSelect("option").FirstOrDefault(o => o.Attributes["selected"] != null) ??
                              @select.CssSelect("option").FirstOrDefault()
                          let value = option.GetAttributeValue("value")
                          select new FormField
                          {
                              Name = name,
                              Value = string.IsNullOrEmpty(value) ? option.InnerText : value
                          };