tusharojha / web_scraper

A very basic web scraper implementation to scrap html elements from a web page.
https://pub.dev/packages/web_scraper
Apache License 2.0
79 stars 41 forks source link

How to get and store list of traversable elements instead of Map. #45

Open Rizwan-Raza opened 3 years ago

Rizwan-Raza commented 3 years ago

I'm having a TABLE with TRs and TDs in it, as we all have. Now I have to get a list of TRs first like 'table > tr' and then further I want to fetch TDs for each TRs.

Here a psuedo code that I want.

List<MyModel> models;
List<Element> elems = webScrapper.getElements("table > tr");
elems.forEach((Element x) => {
      String name  = x.getElementTitle("td > .name").first;
      String src  = x.getElementAttribute("td > .image", "src").first;
      String desc  = x.getElementTitle("td > .desc").first;
      models.add(MyModel(name: name, source: src, description:desc));
});
// Now here I'm having a list of models with properly mapped values.
print(models);

What I'm doing currently is this

List<MyModel> models;

List<String> names = webScrapper.getElementTitle("table > tr > td > .name");
List<String> images = webScrapper.getElementAttribute("table > tr > td > .image", "src");
List<String> descs = webScrapper.getElementTitle("table > tr > td > .desc");

// Now I'm having problem to make a single list of models from these 3+ lists.

Here I'm getting problem cause not all the TRs have same 3 TDs. Some of them have 2 and some have only 1. Like simply sometimes I get 20 names, 18 images and 8 descs. And I can't judge who's desc is for who's and images too, cause they are in different list without any common info.

Help me here, please.

wpsouto commented 3 years ago
List<MyModel> models;
List<Element> elems = webScrapper.getElements("table > tr");
var count = 1;
for (var i = 0; i < elems.length; i++) {
      String name  = elems.getElementTitle("td:nth-child($count) >.name");
      String src  = elems.getElementAttribute("td:nth-child($count) > .image", "src");
      String desc  = elems.getElementTitle("td:nth-child($count) > .desc");
      models.add(MyModel(name: name, source: src, description:desc));
      count++;
}
// Now here I'm having a list of models with properly mapped values.
print(models);
Rizwan-Raza commented 3 years ago

You just rewrote my psuedo code, PSUEDO CODE, except you replaced forEach method with native for and accessing element field from parent instead of child row.

There's in no method like .getElements("pathToSearch") which returns Traversable Node list. For that only, I raised this issue. Else things are good to go. The practically possible code with this library is below the first one.

tusharojha commented 3 years ago

Hi @Rizwan-Raza! Thanks for filing an issue.

I got your point and this definitely needs to be resolved. I have added it to my todo list and will be covered in the next update. Also, It would be great if you would like to suggest anything or give more info on your issue.