Closed themousepotato closed 5 years ago
Awesome! This is a part of data validation step. Where do you suggest this check be added? 🤔
Immediately after scraping IMHO. If you want that more structured, you can write a validator which can be called manually after scraper. But, that would be an overkill.
Agreed! I think we should add the known validation heuristics during the process of scrapping and before the data being written to the output JSON. The scrapper code should be responsible for providing structured, usable and valid data from websites.
I'll have a go... The code that needs changing is here, right? https://github.com/kshitij10496/hercules/blob/01e93c11248968947cf0786d2eb2694aeec2265d/data/scrapper/course_rooms.py#L82-L86
Hey @Pikachu920 ! 👋 Thanks for picking this up.
The code that needs changing is here, right?
Yes, I think so too. A validation check here should be the ideal way to fix this.
Thoughts @themousepotato ?
Sorry for the late comment. @kshitij10496 You're right. @Pikachu920 That's exactly the validation part. Thanks for finding time to point that out :)
@Pikachu920 Are you still interested in fixing this? 😄
absolutely -- i'll try it now!
There are room numbers with value
'0'
. Replace those with'In Dept'
. Also, there are'In Deptt'
. Replace those with'In Dept'
;)