metakgp / mftp

CDC noticeboard scraper
GNU Affero General Public License v3.0
34 stars 42 forks source link

Add attachments to the emails #31

Closed djbarnwal closed 4 years ago

djbarnwal commented 4 years ago

A few of the notices on the ERP notice board have a PDF file attached to them. It would be nice to have these attachments also available in the MFTP forums.

thealphadollar commented 4 years ago

@icyflame @hargup I can see that we have the code for getting attachment from the ERP (L77) but then it is removed as a sanitisation in L108. I'm curious to know why this step has been taken? We should have been sending attachments at some point, I suppose.

Also, do you think providing the attachment URL is of use? I guess not since the links are protected behind ERP auth and hence they will have to login, and ERP does not use referrer head to redirect user post login. Also, the CDC has removed attachments from their public notice board(http://www.cdc.iitkgp.ac.in/notice/Placements) so we cannot put links from there as well.

icyflame commented 4 years ago

I'm curious to know why this step has been taken?

The function removes the raw attachment bytes from the record that's going to be stored in the database. The key is deleted only from the shallow copy of the notice object. The shallow copy isn't the one that's actually sent to the user.

        sanitised_notice = sanitise_notice_for_database(notice)

        # The notice object used to check whether the current notice has already been sent to the
        # user or not => This object does NOT have the attachment_raw key
        db_notice = notices_coll.find_one(sanitised_notice)

        if db_notice is None:
            # The notice object that's sent to the user => this does have the attachment_raw key
            different_notices.append(notice)

As long as nothing has changed in the way attachments are stored and returned by the ERP, the existing code should have just continued to work.

sriyash421 commented 4 years ago

l take this up after the current CDC, so as to not disturb the equilibrium when it's needed the most. @thealphadollar