Closed EwanSC closed 3 years ago
I can't replicate. Can you provide more details? (How you're using the scraper, browser, urls if any, etc?)
2021-08-21-province_Dalmatia+inscription_genusmilites+term1%-841.zip
Browser: Chrome
Version 92.0.4515.159 (Official Build) (64-bit)
Huh. ok, will investigate with less alcohol
How tedious. Can't replicate on local binder, or local docker. Have you managed to cause this with any other searches?
Interesting... I will have a try again tomorrow, also with less alcohol
Ok, fixed by shortening filename (and/or by removing the %).
@petrifiedvoices, @EwanSC can we figure out the maximum set of search terms that will still return something? I want to see if it's a %
or filename length.
Also, @petrifiedvoices, @RayLaurence should we have 2021-08-21-prov_Dalmatia+igenus_milites-841.tsv
or 2021-08-21-EDCS_via_Lat_Epig-prov_Dalmatia+genus_milites-841.tsv
?
Happy to do that
Confirming that the search with Inscription Genus now works for me with the short title changes.
Re filename size issue:
This filename was not too long: 2021-08-22-prov_DaciaDalmatiaPalaestina+term2_milit+from_1+to_200+genus_AugustiAugustaecarminainscriptioneschristianaelegeslibertilibertaelitteraeerasae+not_genus_litteraeinlituramiliariamilitariamilitesmulieresnomensingulare+term1_miles-0.tsv
Search: 2 text terms, 3 province, date to, date from, 6 include genus, 6 exclude genus - result: 243 character filename
This filename was too long [Errno 36]: 2021-08-22-prov_DaciaDalmatiaPalaestina+term2_milit+from_1+to_200+genus_AugustiAugustaecarminainscriptioneschristianaelegeslibertilibertaelitteraeerasaelitteraeinlitura+not_genus_miliariamilitariamilitesmulieresnomensingulareofficiumprofessio+term1_miles-0.tsv
Search: 2 text term, 3 province, date to, date from, 7 include genus, 6 exclude genus - result: 260 character filename
So I guess the limit is somewhere between these too? 250?
So, to you all, how do we encode this into a filename that is less than uh... lots characters? What do you need to know? Are there standard genus abbreviations?
Whilst I think Ray will have a lot more to say on this, perhaps we could limit to three genus in the filename and then a generic '+(number)more' or something when there are more than three? I think most people using it would change the filename anyway, but this is just a hunch. Another option would maybe be reducing the expressions with 'tituli -' at the beginning to just 't-'. All just brainstorming...
Given a query of:
$ ./src/lat_epig/parse.py -o or -v Dacia -v Dalmatia -v Palaestina -t milit -df 1 -dt 200 -ig "Augusti/Augustae" -ig carmina -ig "inscriptiones christianae" -ig "leges" -ig "litterae erasae" -ig "litterae in litura" -ig "miliaria" -ig "militaria" -ng "mulieres" -ng "nomen singulare" -ng "ordo decurionum" -ng "reges diplomata" -ng "ordo equester" -ng "titul fabricationis" -ng "tituli honorarii" -ng "tituli prossessionis" miles
What do you think of it outputting:
'2021-08-22-EDCS_via_Lat_Epig-term1_miles+op_or+term2_milit+prov_Dacia|Dal᷃|Pal᷃+from_1+to_200+genus_Aug᷃|car᷃|ins᷃|leges|lit᷃|litteraeinlitura|mil᷃|militaria+not_genus_mul᷃|nom᷃|o+++-0.json'
and
'2021-08-22-EDCS_via_Lat_Epig-term1_miles+op_or+term2_milit+prov_Dacia|Dal᷃|Pal᷃+from_1+to_200+genus_Aug᷃|car᷃|ins᷃|leges|lit᷃|litteraeinlitura|mil᷃|militaria+not_genus_mul᷃|nom᷃|o+++-0.tsv'
(I think that's the right unicode to indicate abbreviation?) I also have a json dump because we have enough metadata that encoding it in the filename is too much.
Here's what it'd look like:
If there are better unicode combining marks, let me know? (or if there's a better way to show abbreviations?) If we're happy with this, I'll start moving the map stuff to run on JSON rather than TSV, since it makes showing the metadata in the legend less sucky.
It seems that filename maybe also struggles with '%' as this output is also not able to be opened on my Chrome. I get 400 bad request: 2021-08-24-EDCS_via_LatEpig-term1%+prov_Dalmatia+to_200+genus_militaria|milites-0.tsv (JSON)
Prov. Dalmatia Genus include: milites, militaria date to: 200
Binder: https://hub-binder.mybinder.ovh/user/mqancienthistor-scrapernotebook-9kj3l4ba/voila/render/EpigraphyScraper.ipynb?token=x-lNs6O2QGGGX65OKFiu6Q Bad request 400: https://hub-binder.mybinder.ovh/user/mqancienthistor-scrapernotebook-9kj3l4ba/voila/render/output/2021-08-24-EDCS_via_Lat_Epig-term1_%+prov_Dalmatia+to_200+genus_militaria%7Cmilites-0.tsv
Thanks... that's useful. (I thought I had fixed it but... no, that's useful.)
Oh, duh. Ok, an nginx proxy means that it's not properly not escaping the %. Which I should remove anyways...
Pushed. Test please?
Ewan have you cleared cache in Chrome?
Get Outlook for iOShttps://aka.ms/o0ukef
From: Brian Ballsun-Stanton @.> Sent: Tuesday, August 24, 2021 10:05:16 PM To: mqAncientHistory/Lat-Epig @.> Cc: Ray Laurence @.>; Mention @.> Subject: Re: [mqAncientHistory/Lat-Epig] Changing Inscription Genus error (#30)
Pushed. Test please?
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://github.com/mqAncientHistory/Lat-Epig/issues/30#issuecomment-904578397, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AHTEYTYL3IG57EGNAC6CRATT6ODHZANCNFSM5CRRMEFA. Triage notifications on the go with GitHub Mobile for iOShttps://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Androidhttps://play.google.com/store/apps/details?id=com.github.android&utm_campaign=notification-email.
Can confirm it works.
Prov. Dalmatia Genus include: milites, militaria date to: 200 This is new output: 2021-08-25-EDCS_via_Lat_Epig-prov_Dalmatia+to_200+genus_milites-444.tsv (JSON)
This output opens file download in new window
Changing Inscription Genus means that generated TSV will not open. New window opens with error:
"400 Bad Request nginx/1.19.2"
Query: Province:Dalmatia Inscription Genus...: milites