tfonteyn / NeverTooManyBooks

A book collection app for Android
GNU General Public License v3.0
68 stars 2 forks source link

Support bertrand.pt or Wook.pt as data source #15

Closed maverick74 closed 11 months ago

maverick74 commented 1 year ago

Please add support for bertrand.pt as data source for Portugal :)

tfonteyn commented 1 year ago

for reference: bertrand is the world's oldest bookshop still in activity: Livraria Bertrand - Wikipedia

maverick74 commented 1 year ago

from what i understand, all books available in bertrand are also in wook and "vice-versa".

The same thing cannot be said about bertrand and portoeditora...

but, if wook has a human-validation system that is not in bertrand

(it maybe the case... I don't know if it was acquired after being a business or if it was born from bertrand, so... independently of who owns who, each bookshop maybe it's own thing - sharing just the same book DB)

bertrand is perfectly fine :)

maverick74 commented 1 year ago

I took a look to the original - book catalogue - app from where you fork. The app returned all my 3 books/tests with most info right!

Maybe you can use some of the code. Or the same sources.

tfonteyn commented 1 year ago
use some of the code

no thanks... I took the concept of the their booklist (builder) code but rewrote (by gradually replacing all code) pretty much the whole app.

same sources.

they use a private server. I can't/won't comment further.

maverick74 commented 1 year ago

pretty much the whole app.

Ahah! That's the best "kind" of rewrites ahahaha

I can't/won't comment further.

ehehe won't touch that subject again ehehe XP

maverick74 commented 11 months ago

@tfonteyn If i can help in any way on this or for testing, let me know about it :)

tfonteyn commented 11 months ago

I didn't forget you :) but the it's good you call out, as I had left it a bit long 👍

I've got it working with the bertrand site. Searching via ISBN and parsing is already working fine. I've written code tests with Portuguese books: 9789720048820 + 9789899087774 to get the basics working.

But other random tests are working fine as well.

So the hard work is done. But I need to find/add some more books to the test to check for other details. One example:

There is some more things like that I need to find/add.

Roughly speaking.... 95% of the job is done, just the last 5% are some annoying details :)

I'll probably get a beta ready soonish

maverick74 commented 11 months ago

But I need to find/add some more books to the test to check for other details.

If you need, I can try to help by finding some examples.

  • Capa integral

It's not something much common for sure... I have an idea of what it might be but not 100% sure. I'll check it and I'll try to get you an example of it.

tfonteyn commented 11 months ago
If you need, I can try to help by finding some examples.

If you have non-common / obscure examples, those would be welcome.

Capa integral: It's not something much common for sure

in that case, don't bother, I was only looking for the most common ones (i.e. soft and hardcover mole/dura)

tfonteyn commented 11 months ago

actually.... that reminds me: I've been adding better support for amazon.es (to which amazon.pt redirects) Same here, if you have non-common / obscure examples, those would be welcome.

tfonteyn commented 11 months ago

if you like, you can pick up the beta here: [link removed]

Please note that it involves a (minor) database update; so as always: make a full backup first!

I'm already running it on my personal devices. I'm still fiddling a bit with it, but depending on your feedback, I'll push a release soon.

maverick74 commented 11 months ago

non-common / obscure examples

i don't own much of those, i'm afraid... but will double check asap.

4.5.0-beta1

I don't see a way to enable/disable the bertrand website. Otherwise, it seems to be working great :)

tfonteyn commented 11 months ago

oh heck... somedays I can be soo stupid... I forgot to globally enable Bertrand.

Here is 4.5.0-beta2

My apologies.

maverick74 commented 11 months ago

Two errors:

and

Recent books: 9789725640418 divina comédia 9789725686317 contos fantásticos 9789898939425 Peter pan

Old books: 9724129128 sete anos no Tibet (gets the wrong book) 8481304964 Siddhartha (gets the wrong book) 9722472540 Congo (does not recognize ISBN as valid) 8481302333 o hóspede de Drácula 9892000285 contos do além 9726116562 a ilha do tesouro 9726115612 as viagens Gulliver 9726116082 romeu e Julieta 9722108514 as aventuras de Pinóquio 9724226549 estranho mas verdadeiro

The ones above might be useful for testing...

tfonteyn commented 11 months ago
403 accessing Bertrand.PT

You got to be joking.... and indeed, my automated test also fails today while it worked fine over the last couple of weeks.

and a manual test:

wget https://www.bertrand.pt/pesquisa/9789899087774
 ...   
HTTP request sent, awaiting response... 403 Forbidden

well, that's a lot of hours/days of work out of the window. I can only presume they either updated the site last night and/or they detected my testing and decided to block robots (which in effect is what my app is). I'll do some workaround tests, but this is a big [bleep] and likely now a dead-end.

As to the ISBN's:

9722472540 Congo (does not recognize ISBN as valid)

it's not, the valid one would be: 9722472542 : the last digit is a checksum see: https://en.wikipedia.org/wiki/ISBN#ISBN-13_check_digit_calculation

If an ISBN is printed wrong in the book, you can try replacing the last digit (which is how I found the '2'). Yes, publisher make mistakes :) You can still search on invalid codes, if in the menu, you disable the "strict isbn" setting.

Old books with 8-digit ISBN's

These are often no longer found on shopping websites (like amazon) as they are books which are no longer in print. Sometimes they do show up in amazon marketplace sales. They do show up on catalogue site, like goodreads and librarything which have blocks in place to prevent their usage. When those catalogue sites are not owned by a large/american coorporation they usually work well. Goodreads+Librarything are owned by Amazon...

It sometimes helps converting them to 13-digits. I've not really documented it, but when you enter an 8-digit valid number, and then tap the green checkmark, it will convert to a 13-digit format. But it's a matter of that book having been reprinted with the converted number and the website(s) having it.

gets the wrong book

sadly, if the website (or the number in the book) is wrong, not much we can do about it. An example:

8481304964 Siddhartha

this seems to be an ISBN which has been used multiple time. It does match the Herman Hesse book, but it also matches https://www.abebooks.co.uk/servlet/SearchResults?kn=8481304964&sts=t&cm_sp=SearchF-_-topnav-_-Results&ds=20 (side note: abebooks is ALSO owned by amazon)

maverick74 commented 11 months ago

Goodreads+Librarything are owned by Amazon...

abebooks is ALSO owned by amazon

$%&/@£§!!! B€zos ain't messing around, is he?! I'm foreseeing they'll soon want to acquire my underwear as well!!!

What's the next best thing (that is Global, opensource and not owned by amazon)???

funny detail: i have 2 books that do not have any ISBN... i thought they had to have to get published, but apparently not...

tfonteyn commented 11 months ago

I've spend a couple of hours to get to the bottom... but it's fatal. The blocking is due to Cloudflare; see Wikipedia.

4.5 will be released without bertrand. But upgrading from your beta version will be fine (the database thing was not related, and still needed in 4.5)

What's the next best thing

I suggest you take a look at Portuguese library sites. For reference, the KBNL source is the "royal library of the netherlands". Perhaps something similar exists in Portugal?

2 books that do not have any ISBN.

Only 2? I have hundreds... ISBN use was only started in 1970 or even later. All books before don't have official isbn numbers.

I'm closing this issue now as with Cloudflare it's a lost cause. But feel free to open a new issue if you have another proposal.

One tip: try the "wget http://url" command in a terminal/command-line. If it reports 403, then forget about that site.

maverick74 commented 11 months ago

I have hundreds

In the meanwhile I found about more 6 or 7 books and another 7 ZX Spectrum manuals/books which do not have either...

But nothing in the hundreds ahahahah

I suggest you take a look at Portuguese library sites.

I'll see if I can get a decent alternative. If I do, I'll open a new issue