From what I remember about the parser, this is how it works:
There are defined schemas, and the parser picks the schema that is likely to work the best, and then tries to parse the html using that schema. The schema is created manually, based on key elements and tags found in the html.
The parser should be able to parse all d2l, blackboard, and moodle html. Currently I see it parsing d2l and moodle well, with the occasional mishap on blackboard. Not all CMS provide emails, so we have sets that have been parsed correctly, but just contain names. We need to figure out how to proceed with them.
There will be situations where the parser almost works perfectly for a school, but may need further customization in order for us to be fully dependent on it. For instance with this set, "ISCI 1102" at Savannah State University, it puts the first and last name in the name column and the word "student" as the email.
From what I remember about the parser, this is how it works: There are defined schemas, and the parser picks the schema that is likely to work the best, and then tries to parse the html using that schema. The schema is created manually, based on key elements and tags found in the html.
The parser should be able to parse all d2l, blackboard, and moodle html. Currently I see it parsing d2l and moodle well, with the occasional mishap on blackboard. Not all CMS provide emails, so we have sets that have been parsed correctly, but just contain names. We need to figure out how to proceed with them.
There will be situations where the parser almost works perfectly for a school, but may need further customization in order for us to be fully dependent on it. For instance with this set, "ISCI 1102" at Savannah State University, it puts the first and last name in the name column and the word "student" as the email.