nathell / skyscraper

Structural scraping for the rest of us.
405 stars 10 forks source link

Sqlite database issues, created but no tables or rows. #19

Open olymk2 opened 6 months ago

olymk2 commented 6 months ago

I have been playing around with your awesome library, I am getting along with the scraping but seeing some wierd behaviour with the database creation,

I have does some digging the db is being create but it has no size, I have put in some debug statements and can see the creation and upserting is returning successful however I can not open the database and view the table or data, when the database is created there is also a -journal file created along side.

It does not look like the execution is done in a transaction though it seem as though that's the behaviour I am experiencing have i encountered a bug or is there something wrong with this very simple example that I am missing ?

I have forked the repo and added the failing example below which should reproduce the issue.

https://github.com/olymk2/skyscraper/blob/master/examples/wiki.clj

Would appreciate any help / advice.

olymk2 commented 5 months ago

@nathell Not sure if your planning on supporting this, but in case it helps anyone I have been doing some more digging it's seeming like its related to the example having a single processor, I have a couple of other working examples which are not that much more different the main difference being I have two defprocessors and two table's with relations I may explore some more if i get a chance.

Also wondering if I can fake the first page as a work around.

nathell commented 5 months ago

Hey @olymk2,

First, thanks for your interest in Skyscraper!

I’m definitely going to be looking into this sometime this week but can’t promise an ETA due to other obligations. I’ll reach out when I need any further info.

Thanks again, Daniel

olymk2 commented 5 months ago

Okay great, I may have actually solved it I am calling the dev version to scrape so skyscraper.dev/scape I think this is what was causing my issue, I went back and took another look I noticed that when I used scrape from core my db was populated correctly, I am not sure if this is intended behaviour or a bug.

either way I think its what caught me out.