Closed rakaar closed 3 years ago
Also, please specify the reason why is this a "WIP"?
I wanted to test the search dropdown once. I sent the PR for the suggestions regarding scrapped data and data format of subjects.
Like CODE - SUBJECT NAME
, this format is okay or any other format?
Update readme with the required details about the script.
The readme.md currently contains important links and image of the demo, there is no information regarding any script. It would be odd if information regarding only this script is based any other places?
Update readme with the required details about the script.
The readme.md currently contains important links and image of the demo, there is no information regarding any script. It would be odd if information regarding only this script is based any other places?
Yes. We need to start doing it.
Also, please specify the reason why is this a "WIP"?
I wanted to test the search dropdown once. I sent the PR for the suggestions regarding scrapped data and data format of subjects. Like
CODE - SUBJECT NAME
, this format is okay or any other format?
Ask for forgiveness than permission.
Yes, it looks good to me.
Hello @rakaar! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:
iqps/data/migrations/0006_auto_20200809_1841.py
:Line 16:121: E501 line too long (170 > 120 characters)
iqps/frontend/forms.py
:Line 11:1: E302 expected 2 blank lines, found 1 Line 16:1: E302 expected 2 blank lines, found 1 Line 18:33: E127 continuation line over-indented for visual indent
iqps/upload/forms.py
:Line 38:30: W605 invalid escape sequence '\d' Line 43:5: E722 do not use bare 'except' Line 111:1: W293 blank line contains whitespace
scripts/scrape_subjects.py
:Line 9:1: E302 expected 2 blank lines, found 1 Line 10:121: E501 line too long (135 > 120 characters) Line 11:38: E231 missing whitespace after ',' Line 15:77: E226 missing whitespace around arithmetic operator Line 17:1: E305 expected 2 blank lines after class or function definition, found 1 Line 21:1: E302 expected 2 blank lines, found 1 Line 28:13: W291 trailing whitespace Line 31:9: E722 do not use bare 'except' Line 34:1: E305 expected 2 blank lines after class or function definition, found 1 Line 41:9: E201 whitespace after '{' Line 41:24: E203 whitespace before ':' Line 44:29: W292 no newline at end of file
Had bugs in the prev code, fixed this
Currently like this
i have another small concern- What if there is a new course added? Then someone has to ensure that the script is run after every semester and JSON file is updated. anyway to ensure this @grapheo12 @thealphadollar
i have another small concern- What if there is a new course added? Then someone has to ensure that the script is run after every semester and JSON file is updated. anyway to ensure this @grapheo12 @thealphadollar
The concern seems valid.
I can think of two ways to go around it, don't be limited to these.
Allow users to select a option that the course is not present in the list and submit without it. In this manner, course will not be a required field.
This method is an extension where we would allow users to specify subject code if it is already not in the list governed by the RegEx of subject code. On the backend we would add it directly to the JSON.
On a personal note, I prefer second method since it makes us less worried about subject codes JSON.
I'm aware of the security concerns; however, with the RegEx filtration it becomes very less of a problem.
Please do point out flaws in this method and, recommended, try to think of better ways 🔥
Allow users to select a option that the course is not present in the list and submit without it. In this manner, course will not be a required field.
If the course is not there, then how will someone search for the paper? 🤔 . It is compulsory
This method is an extension where we would allow users to specify subject code if it is already not in the list governed by the RegEx of subject code. On the backend we would add it directly to the JSON
Writting to a file directly is not a good idea. Because of concurrent requests, it will erase data. For example for an existing array Arr
. One request came to append a
, another req came to append b
, both reqs came nearly at the same time. Then we either the JSON will have Arr, a
or Arr, b
. This is not an unlikely thing. This happened in kwoc 2019.
One more option is save these subjects in Database. But since we are scraping these subjects directly from site and updating every semester(4 months), updating DB is heavy lifting.
The best option which seems to me is Use some bot or something else to remind. Since we are scraping all UG and PG courses from insti site, we should not be missing any courses. But for Just in Case, we can have an input string field, where user can enter manually. And maybe let that subject be updated next semester on our list.
let that subject be updated next semester on our list.
Yes, this seems good. We can also avoid the issue of concurrent writes here by simply using append to the file :D
One request came to append a, another req came to append b, both reqs came nearly at the same time. Then we either the JSON will have Arr, a or Arr, b. This is not an unlikely thing. This happened in kwoc 2019.
This is a bit more complex: we can use a simple lock on the file and wait for the previous lock to free. Since we are 100% that we will not be having huge concurrent traffic, this ever this situation comes in one month or two, the lock would be able to handle it easily.
This is a bit more complex: we can use a simple lock on the file and wait for the previous lock to free. Since we are 100% that we will not be having huge concurrent traffic, this ever this situation comes in one month or two, the lock would be able to handle it easily.
If we are ready to write code for locks and all that stuff. Then I feel saving the subjects in DB is a much simpler option.
The point is not about huge concurrent traffic. In attempt to append new subjects(which are few in number), we are losing a few of them-whats the point? these options seem simple to me - 1.update at the end of the semester or 2. use DB for subjects too
update after a live discussion
I have added lock using filelock package but i am facing these two issues
docker-compose run web pip install filelock
, even after that it says no module named filelock
code_subject.json.lock
- I have setup using docker container and tried installing through
docker-compose run web pip install filelock
, even after that it saysno module named filelock
Interesting. If I were you, I would try to find the python version the docker file is using and see if the package is available for that version. If it is available, I would let it be installed through requirements.txt
. Even then if it fails, I'd have to investigate more with error logs etc.
- what do i put in the
code_subject.json.lock
Nothing :D It is just there to signify of a lock. It is not supposed to be committed.
The code_subjects.json.lock is not required to be committed
Does it create the file on its own if not found ?
LOCK_PATH = os.path.join(BASE_DIR, 'static/files/code_subjects.json.lock')
lock = FileLock(LOCK_PATH, timeout=3)
In future it will cause error right, if someone doesn't create an empty file by that name at that location.Hence I committed that file
I'm sure it creates the lock file on its own.
However, I would recommend trying it once and then seeing the result yourself.
Tested on local machine. It works fine. So I am merging it. Great work @rakaar Thanks for the reviews and the insights @thealphadollar
/scripts/scrape_subjects.py
/iqps/static/files/code_subjects.json
/iqps/upload/views.py
, add a subject withTypedChoiceField
Linked issue - #8