gnezd commented 3 years ago

The fetching url here https://github.com/jugoetz/chemical-ims-webpage/blob/caee031cbe17836259bdcf89516147371c6c2016/inventorymanagement/updateFromExpereact.py#L184

can be improved to sth. like

http://expereact.ethz.ch/searchstock?for=chemexper&bl=100000&search=+AND+Field10.6%3D%22GBOD%40%40%22+OR+Field10.6%3D%22GYAM%40%40%22+OR+Field10.6%3D%22GZEN%40%40%22&so=Field10.15&for=report&mime_type=application/vnd.ms-excel

To avoid hard-coding the group names in there, the secret lies in the GET parameter search. In this case after URI decoding gives

&search=+AND+Field10.6=\"GBOD@@\"+OR+Field10.6=\"GYAM@@\"+OR+Field10.6=\"GZEN@@\"

So leaving the uri to be managed by parse_expereact(url) seems a good idea for me. Just manage a list of groups and keep adding +OR+Field10.6=\"<group prefix>@@\" and then url_encode back.

Benefit?

Saves some (75%) network traffic, and maybe also the load of html generation/parsing on both the expereact server and bode-pi2. I would actually suggest the database be updated more frequently and not just daily, is there a reason against it? If we could do a more frequent update this could be a big save of resource.

-rw-r--r-- 1 gnezd gnezd  38M Nov  6 20:46 all-entries.html
-rw-r--r-- 1 gnezd gnezd 8.4M Nov  6 21:03 selective.html

jugoetz commented 3 years ago

Good point! The idea behind downloading all entries was scalability of the system. Adding a new group is now just changing a single line in filter_groups() as opposed to playing around with finecky URL parameters. But I wasn't well aware of URI encoding so that is a good alternative. I changed the update schedule to daily because I'm worried about SQL lockout during the location updates (1-2 min).

gnezd commented 3 years ago

I see. Good point on the SQL lockout. Have you an idea whether the parsing and actual SQLite operation takes time, or was it the fetching of the html? I have a vague feeling it was the latter, for the download is only ~ 300kb/s...

A long term solution will be moving away from SQLite and use MySQL. That will allow simultaneous multisession access as it was designed for that. The coding change will also be minimal since you interact with the db fully with SQL here. Not sure about your admin page though.

Short term solution will be to make sure to open the SQL for only short periods and close it often when doing the slow stuffs (running through 8k+ rows?). Dunno. Need to think again when I'm more awake.

Seems to be OK: https://stackoverflow.com/questions/10325683/can-i-read-and-write-to-a-sqlite-database-concurrently-from-multiple-connections

If there's a lockout experienced, it might be of a different reason..

jugoetz commented 3 years ago

So fetching the html takes ~15 s for just Bode chemicals. Few minutes with entire list. But the limiting element for download speed isn't the Raspi, it is the Expereact website. Then then the location update is the other major time consumer at ~2 min. This is why I reduced the cronjob for the updates to once daily at 4 am. Switching to mySQL might need a change in the pandas.DataFrame.to_sql() statement. Not sure if this is easy or not. The admin page will be fine.

Compromise would be to call updateFromExpereact.py (--location) <-- without flag every 2 h, with flag once at night and add if sys.arg[1] == '--location' clause around the call to updateLocation() in updatefromExpereact.py

jugoetz commented 3 years ago

Checked the SO link. Seems ok indeed. The multiple processing only works for SELECT, not for UPDATE, but the individual query will be very short so other processes will just slide in at some point. I'll increase the update frequency 👍

jugoetz commented 1 year ago

The timing proved to be no problem, closing.

jugoetz / chemical-ims-webpage

Add the group selection criteria to the fetching url #10

Benefit?