Determine uniform output from the parsing of pdf files

Current parsing logic only extracts the grades. We should extract metadata like semester number, branch name, credits, spi , etc. I have done preliminary work to extract this information and provide a uniform json detailed below.

        results = [
            {
                "name"                : "<student full name>",
                "rollno"              : "<rollno>"
                "program"             : "<btech, mtech, etc>",
                "branch"              : "<branchname>",
                "semester"            : "<semester number>",
                "pdf_filename"        : "<filename>",
                "pdf_pagenum"         : "<pagenum>",
                "release_date"        : "<date>",
                "examination_date"    : "<date>",
                "notice"              : "<notice>",
                "SPI"                 : "<spi>",
                "total_credits"       : "<total_credits>",
                "papers_failed"       : ["sub1_code", "sub2_code", ...],
                "marks"               : {
                    "<subject1_code>"    : "<marks>",
                    "<subject2_code>"    : "<marks>",
                    ...
                }
            },
            # ...
        ]

I am currently using pdfminer and tabula to extract the data. Some related work was done in 2366fc1ebac70b31cc79e0dc9bc1829dab46fe05

FYI - @tezas @himanshuhy

rsaim / supplementary

Determine uniform output from the parsing of pdf files #3