mqAncientHistory / Lat-Epig

The Lat-Epig interface allows you to query the EDCS and save the search result in a TSV file and plot the results on a map of the Roman Empire without any prior knowledge of programming.
https://mybinder.org/v2/gh/mqAncientHistory/Lat-Epig/HEAD?urlpath=notebooks/EpigraphyScraper.ipynb
GNU General Public License v3.0
15 stars 0 forks source link

Changing Inscription Genus error #30

Closed EwanSC closed 3 years ago

EwanSC commented 3 years ago

Changing Inscription Genus means that generated TSV will not open. New window opens with error:

"400 Bad Request nginx/1.19.2"

Query: Province:Dalmatia Inscription Genus...: milites

Denubis commented 3 years ago

I can't replicate. Can you provide more details? (How you're using the scraper, browser, urls if any, etc?)

2021-08-21-province_Dalmatia+inscription_genusmilites+term1%-841.zip

EwanSC commented 3 years ago

Error URL: https://hub.gke2.mybinder.org/user/mqancienthistory-lat-epig-cotfdu05/voila/render/output/2021-08-21-province_Dalmatia+dating_to_100+inscription_genus_milites+term1_%-250.tsv

Browser: Chrome

Version 92.0.4515.159 (Official Build) (64-bit)

Denubis commented 3 years ago

Huh. ok, will investigate with less alcohol

Denubis commented 3 years ago

How tedious. Can't replicate on local binder, or local docker. Have you managed to cause this with any other searches?

EwanSC commented 3 years ago

Interesting... I will have a try again tomorrow, also with less alcohol

Denubis commented 3 years ago

Ok, fixed by shortening filename (and/or by removing the %).

@petrifiedvoices, @EwanSC can we figure out the maximum set of search terms that will still return something? I want to see if it's a % or filename length.

Also, @petrifiedvoices, @RayLaurence should we have 2021-08-21-prov_Dalmatia+igenus_milites-841.tsv or 2021-08-21-EDCS_via_Lat_Epig-prov_Dalmatia+genus_milites-841.tsv ?

EwanSC commented 3 years ago

Happy to do that

EwanSC commented 3 years ago

Confirming that the search with Inscription Genus now works for me with the short title changes.

Re filename size issue:

This filename was not too long: 2021-08-22-prov_DaciaDalmatiaPalaestina+term2_milit+from_1+to_200+genus_AugustiAugustaecarminainscriptioneschristianaelegeslibertilibertaelitteraeerasae+not_genus_litteraeinlituramiliariamilitariamilitesmulieresnomensingulare+term1_miles-0.tsv

Search: 2 text terms, 3 province, date to, date from, 6 include genus, 6 exclude genus - result: 243 character filename

This filename was too long [Errno 36]: 2021-08-22-prov_DaciaDalmatiaPalaestina+term2_milit+from_1+to_200+genus_AugustiAugustaecarminainscriptioneschristianaelegeslibertilibertaelitteraeerasaelitteraeinlitura+not_genus_miliariamilitariamilitesmulieresnomensingulareofficiumprofessio+term1_miles-0.tsv

Search: 2 text term, 3 province, date to, date from, 7 include genus, 6 exclude genus - result: 260 character filename

So I guess the limit is somewhere between these too? 250?

Denubis commented 3 years ago

So, to you all, how do we encode this into a filename that is less than uh... lots characters? What do you need to know? Are there standard genus abbreviations?

EwanSC commented 3 years ago

Whilst I think Ray will have a lot more to say on this, perhaps we could limit to three genus in the filename and then a generic '+(number)more' or something when there are more than three? I think most people using it would change the filename anyway, but this is just a hunch. Another option would maybe be reducing the expressions with 'tituli -' at the beginning to just 't-'. All just brainstorming...

Denubis commented 3 years ago

Given a query of:

$ ./src/lat_epig/parse.py -o or -v Dacia -v Dalmatia -v Palaestina -t milit -df 1 -dt 200 -ig "Augusti/Augustae" -ig carmina -ig "inscriptiones christianae" -ig "leges" -ig "litterae erasae" -ig "litterae in litura" -ig "miliaria" -ig "militaria" -ng "mulieres" -ng "nomen singulare" -ng "ordo decurionum" -ng "reges diplomata" -ng "ordo equester" -ng "titul fabricationis" -ng "tituli honorarii" -ng "tituli prossessionis" miles

What do you think of it outputting: '2021-08-22-EDCS_via_Lat_Epig-term1_miles+op_or+term2_milit+prov_Dacia|Dal᷃|Pal᷃+from_1+to_200+genus_Aug᷃|car᷃|ins᷃|leges|lit᷃|litteraeinlitura|mil᷃|militaria+not_genus_mul᷃|nom᷃|o+++-0.json' and '2021-08-22-EDCS_via_Lat_Epig-term1_miles+op_or+term2_milit+prov_Dacia|Dal᷃|Pal᷃+from_1+to_200+genus_Aug᷃|car᷃|ins᷃|leges|lit᷃|litteraeinlitura|mil᷃|militaria+not_genus_mul᷃|nom᷃|o+++-0.tsv'

(I think that's the right unicode to indicate abbreviation?) I also have a json dump because we have enough metadata that encoding it in the filename is too much.

Denubis commented 3 years ago

Here's what it'd look like:

Screenshot_20210822_151258_Selection_001

If there are better unicode combining marks, let me know? (or if there's a better way to show abbreviations?) If we're happy with this, I'll start moving the map stuff to run on JSON rather than TSV, since it makes showing the metadata in the legend less sucky.

EwanSC commented 3 years ago

It seems that filename maybe also struggles with '%' as this output is also not able to be opened on my Chrome. I get 400 bad request: 2021-08-24-EDCS_via_LatEpig-term1%+prov_Dalmatia+to_200+genus_militaria|milites-0.tsv (JSON)

Prov. Dalmatia Genus include: milites, militaria date to: 200

Binder: https://hub-binder.mybinder.ovh/user/mqancienthistor-scrapernotebook-9kj3l4ba/voila/render/EpigraphyScraper.ipynb?token=x-lNs6O2QGGGX65OKFiu6Q Bad request 400: https://hub-binder.mybinder.ovh/user/mqancienthistor-scrapernotebook-9kj3l4ba/voila/render/output/2021-08-24-EDCS_via_Lat_Epig-term1_%+prov_Dalmatia+to_200+genus_militaria%7Cmilites-0.tsv

Denubis commented 3 years ago

Thanks... that's useful. (I thought I had fixed it but... no, that's useful.)

Denubis commented 3 years ago

Oh, duh. Ok, an nginx proxy means that it's not properly not escaping the %. Which I should remove anyways...

Denubis commented 3 years ago

Pushed. Test please?

RayLaurence commented 3 years ago

Ewan have you cleared cache in Chrome?

Get Outlook for iOShttps://aka.ms/o0ukef


From: Brian Ballsun-Stanton @.> Sent: Tuesday, August 24, 2021 10:05:16 PM To: mqAncientHistory/Lat-Epig @.> Cc: Ray Laurence @.>; Mention @.> Subject: Re: [mqAncientHistory/Lat-Epig] Changing Inscription Genus error (#30)

Pushed. Test please?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://github.com/mqAncientHistory/Lat-Epig/issues/30#issuecomment-904578397, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AHTEYTYL3IG57EGNAC6CRATT6ODHZANCNFSM5CRRMEFA. Triage notifications on the go with GitHub Mobile for iOShttps://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Androidhttps://play.google.com/store/apps/details?id=com.github.android&utm_campaign=notification-email.

EwanSC commented 3 years ago

Can confirm it works.

Prov. Dalmatia Genus include: milites, militaria date to: 200 This is new output: 2021-08-25-EDCS_via_Lat_Epig-prov_Dalmatia+to_200+genus_milites-444.tsv (JSON)

This output opens file download in new window