rubycdp / vessel

Fast high-level web crawling Ruby framework
https://vessel.rubycdp.com
MIT License
645 stars 11 forks source link

Is the project being maintained? #33

Closed ricsdeol closed 5 months ago

ricsdeol commented 6 months ago

Hello I am very interested in the Vessel Crawler, I found it simple and productive. However, I have some doubts that are not in the README or more elaborate example.

bundle exec vessel new MyCrawler

How i run in PROD mode? What would Fields or Middleware? How to use it?

Could this path lib/helpers be where I could put my classes, for example saved in a database or make a call in another api with data from the crawler ?

I can help improve this documentation with these features, but I need this initial kick to create a more complete documentation.

Thanks Att.

route commented 6 months ago

It is! And it's actively used internally, it's just lack of time to keep the docs updated. In fact there's even more coming in this year. I'll give answers in the next comment.

route commented 5 months ago

@ricsdeol First of all I need to make new release and update the README a bit.

  1. VESSEL_ENV=prod bundle exec vessel start crawler_name
  2. Fields are used to declare types and parse values when assigning them. If you create a file for example:
    Vessel::Crawler::FieldType.add(:desc) do |value|
    value.to_s.strip[0..20] + "..."
    end

    and then assign filed in the crawler:

    field :desc, value: xpath(...)

it will be shortened to 20 chars when assignined.

  1. Fields at the end are passed to middleware stack where you can transform and pass to other middlewares. Default middleware is Debug but for production you could add SendToProduction middleware that at the end will send data to given endpoint.
  2. Helpers are just some services or modules that help to convert the data, you could create a class and put it there. For instance PriceHelper and so on.
ricsdeol commented 5 months ago

Ok very cool, thanks a lot