Open kenkangxgwe opened 6 years ago
Yet another parser, which I think is better: https://github.com/ScaDS/dblp-parser
The structure is like this:
Ground tags: incollection www mastersthesis book proceedings phdthesis inproceedings article
6173776 tags in total.
46750 incollection tags in total.
2064044 www tags in total.
10 mastersthesis tags in total.
15023 book tags in total.
36599 proceedings tags in total.
64924 phdthesis tags in total.
2151343 inproceedings tags in total.
1795083 article tags in total.
incollection tag ranges: (ee,0,1) (note,0,1) (chapter,0,1) (year,1,1) (author,0,50) (title,1,1) (cdrom,0,1) (url,1,1) (number,0,1) (pages,0,1) (publisher,0,1) (cite,0,104) (booktitle,1,1) (crossref,0,1)
www tag ranges: (ee,0,1) (note,0,8) (editor,0,6) (year,0,1) (author,0,10) (cite,0,30) (title,0,1) (crossref,0,1) (booktitle,0,1) (url,0,17)
mastersthesis tag ranges: (ee,0,1) (year,1,1) (school,1,1) (author,1,1) (title,1,1) (url,0,1)
book tag ranges: (ee,0,7) (editor,0,13) (note,0,1) (year,1,1) (author,0,18) (isbn,0,4) (title,1,1) (cdrom,0,1) (url,0,2) (volume,0,1) (pages,0,2) (month,0,1) (school,0,2) (series,0,2) (publisher,0,2) (cite,0,741) (booktitle,0,1)
proceedings tag ranges: (ee,0,5) (editor,0,31) (note,0,2) (address,0,1) (year,1,1) (author,0,1) (isbn,0,3) (title,1,1) (url,0,2) (volume,0,2) (number,0,1) (journal,0,1) (pages,0,1) (series,0,2) (publisher,0,2) (cite,0,212) (booktitle,0,1) (crossref,0,1)
phdThesis tag ranges: (ee,0,7) (note,0,2) (year,1,1) (author,1,3) (isbn,0,3) (title,1,1) (url,0,1) (volume,0,1) (number,0,1) (pages,0,2) (month,0,1) (school,0,3) (series,0,1) (publisher,0,1)
inproceedings tag ranges: (ee,0,7) (note,0,1) (editor,0,3) (year,1,1) (author,0,139) (title,1,1) (cdrom,0,2) (url,0,3) (number,0,1) (pages,0,1) (month,0,1) (cite,0,137) (booktitle,1,1) (crossref,0,2)
article tag ranges: (ee,0,2) (note,0,2) (editor,0,5) (year,1,1) (author,0,287) (title,1,1) (cdrom,0,1) (url,0,1) (volume,0,1) (number,0,1) (journal,0,1) (pages,0,1) (month,0,1) (publisher,0,1) (cite,0,348) (crossref,0,1) (booktitle,0,1)
Used: 32 seconds
Simple explanation of the output, it is under format of (tagname, min, max), min=0 means the tag may be absent in some instance.
Using this analyzer: https://github.com/lvergergsk/dblpParser I'm writing the parser, feel free to discuss or contribute.
Article Histogram: {ee={1=1667488, 2=68275}, note={1=839, 2=61}, editor={1=3, 2=5, 3=2, 4=2, 5=2}, year={1=1761497}, author={1=372504, 2=515270, 3=411305, 4=236108, 5=111568, 6=50516, 7=22804, 263=1, 8=11949, 9=6534, 10=4092, 11=2351, 12=1581, 13=1058, 14=787, 15=555, 16=381, 17=317, 18=234, 19=180, 20=176, 21=135, 22=112, 23=88, 24=81, 25=61, 26=48, 27=58, 28=57, 29=47, 30=47, 31=17, 287=2, 32=30, 33=27, 34=19, 35=15, 36=11, 37=13, 38=11, 39=6, 40=8, 41=4, 42=10, 43=7, 44=6, 45=6, 46=3, 47=8, 48=6, 49=2, 50=6, 51=2, 52=5, 53=1, 54=2, 55=5, 56=3, 57=5, 58=2, 59=2, 60=1, 61=2, 64=2, 65=1, 67=2, 68=1, 69=2, 71=1, 74=1, 75=2, 78=1, 79=1, 86=1, 92=1, 95=1, 96=1, 99=1, 101=1, 105=1, 112=1, 119=1}, title={1=1731386}, cdrom={1=4001}, article={1=1761503}, url={1=1760957}, volume={1=1760791}, number={1=1411077}, pages={1=1553175}, journal={1=1761273}, month={1=10600}, cite={1=155, 2=32, 3=25, 4=20, 5=41, 6=25, 7=23, 8=44, 9=39, 10=40, 11=38, 12=47, 13=43, 14=30, 15=48, 16=47, 17=43, 18=54, 19=39, 20=35, 21=44, 22=43, 23=35, 24=55, 25=39, 26=40, 27=37, 28=50, 29=36, 30=40, 31=39, 32=27, 33=27, 34=33, 35=32, 36=13, 37=24, 38=26, 39=11, 40=25, 41=19, 42=16, 43=12, 44=9, 45=16, 46=14, 47=10, 48=16, 49=9, 50=8, 51=9, 52=7, 53=12, 54=10, 55=8, 56=6, 57=3, 58=8, 59=1, 60=8, 61=3, 62=8, 63=6, 64=3, 65=5, 66=2, 67=2, 68=4, 69=2, 70=3, 71=4, 73=2, 74=1, 76=2, 78=1, 79=1, 81=5, 83=1, 84=1, 86=1, 87=1, 89=1, 90=1, 91=1, 92=1, 348=1, 94=1, 99=1, 100=1, 101=1, 105=1, 106=1, 107=1, 109=1, 114=1, 116=2, 117=1, 120=1, 123=1, 126=1, 136=1, 137=1, 140=1, 158=1, 159=1, 163=1, 165=2, 171=1, 172=1, 174=1, 194=1, 198=1, 205=1, 232=1, 249=1, 252=1}, publisher={1=228}, crossref={1=1886}, booktitle={1=223}}
Article Total: 1761503
Inproceedings Histogram: {ee={1=1600943, 2=359613, 3=14558, 5=2, 7=2}, note={1=228}, editor={1=3, 2=2, 3=3}, year={1=2114232}, author={1=275526, 2=593472, 3=570397, 4=355581, 5=170927, 6=77167, 7=32072, 8=15654, 9=7875, 10=4435, 11=2533, 139=1, 12=1570, 13=1065, 14=708, 15=526, 16=370, 17=265, 18=210, 19=146, 20=131, 21=103, 22=77, 23=53, 24=52, 25=51, 26=28, 27=32, 28=34, 29=20, 30=20, 31=15, 32=10, 33=9, 34=13, 35=6, 36=11, 37=11, 38=4, 39=7, 40=4, 41=3, 42=5, 43=1, 44=5, 45=4, 46=2, 47=2, 48=1, 49=1, 55=1, 56=1, 57=3, 60=1, 61=2, 62=1, 65=1, 70=1, 76=1, 77=3, 94=1, 102=1, 114=1}, title={1=2105658}, inproceedings={1=2114232}, cdrom={1=8052, 2=430}, url={1=2114231}, number={1=379}, pages={1=2004821}, month={1=1}, cite={1=41, 2=66, 3=77, 4=92, 5=91, 6=119, 7=125, 8=165, 9=175, 137=1, 10=234, 11=256, 12=282, 13=278, 14=313, 15=321, 16=291, 17=309, 18=265, 19=260, 20=267, 21=211, 22=229, 23=212, 24=176, 25=173, 26=169, 27=137, 28=125, 29=126, 30=99, 31=82, 32=71, 33=57, 34=61, 35=42, 36=46, 37=36, 38=27, 39=27, 40=36, 41=9, 42=24, 43=15, 44=20, 45=16, 46=10, 47=4, 48=4, 49=5, 50=7, 51=6, 52=3, 53=2, 54=7, 55=10, 56=3, 57=3, 58=2, 59=2, 60=1, 61=6, 62=4, 63=2, 64=3, 65=1, 66=1, 67=1, 68=1, 70=2, 71=2, 72=3, 73=1, 75=1, 76=1, 78=1, 79=1, 81=2, 85=1, 87=3, 88=1, 89=2, 95=1, 100=1, 101=1, 102=1, 122=1, 124=2}, crossref={1=2095909}, booktitle={1=2114232}}
Inproceedings Total: 2114232
Incollection Histogram: {ee={1=43852}, note={1=40939, 2=4810, 3=1168, 4=334, 5=96, 6=32, 7=7, 8=1}, chapter={1=2}, editor={1=1, 2=1, 4=2, 5=1, 6=1}, year={1=46342}, author={1=2006175, 2=44681, 3=8513, 4=3256, 5=1355, 6=659, 7=304, 8=188, 9=89, 10=57, 11=37, 12=22, 13=15, 14=11, 15=11, 16=14, 17=2, 18=2, 19=7, 21=1, 22=2, 25=1, 28=1, 29=2, 32=1, 50=1}, title={1=2074702}, cdrom={1=53}, url={1=77381, 2=7554, 3=3209, 4=1414, 5=812, 6=487, 7=407, 8=305, 9=233, 10=153, 11=79, 12=47, 13=22, 14=16, 15=4, 16=3, 17=1}, number={1=40}, pages={1=42604}, incollection={1=46325}, www={1=2028874}, cite={1=100, 2=6, 3=2, 4=1, 6=1, 7=1, 8=1, 9=2, 10=1, 11=1, 16=1, 17=2, 19=1, 20=1, 23=1, 87=1, 30=2, 31=1, 104=1, 40=1, 43=1, 44=2, 45=1, 49=1, 59=1, 60=1}, publisher={1=91}, crossref={1=42869}, booktitle={1=46326}}
Incollection Total: 46325
Proceedings Histogram: {ee={1=22618, 2=7008, 3=339, 4=2, 5=1}, editor={1=3755, 2=8824, 3=6866, 4=4425, 5=1944, 6=846, 7=399, 8=223, 9=111, 10=77, 11=45, 12=58, 13=21, 14=12, 15=11, 16=11, 17=6, 18=4, 19=4, 20=5, 21=1, 22=2, 23=1, 26=1, 27=1}, note={1=241, 2=6}, address={1=3}, year={1=35916}, author={1=2}, isbn={1=28729, 2=1471, 3=11}, title={1=35858}, url={1=35859, 2=2}, volume={1=16324, 2=1}, number={1=17}, pages={1=8}, journal={1=4}, series={1=16643, 2=2}, publisher={1=34699, 2=5}, cite={212=1}, proceedings={1=35916}, booktitle={1=35438}, crossref={1=10}}
Proceedings Total: 35916
Book Histogram: {ee={1=8778, 2=258, 3=31, 4=26, 5=10, 6=2, 7=1}, editor={1=209, 2=422, 3=382, 4=191, 5=57, 6=22, 7=5, 8=5, 10=1, 13=1}, note={1=5}, year={1=14923}, author={1=8229, 2=3409, 3=1414, 4=347, 5=124, 6=28, 7=29, 8=11, 9=6, 10=2, 12=5, 13=1, 15=1, 17=1, 18=1}, book={1=14923}, isbn={1=10805, 2=2077, 3=78, 4=9}, title={1=14910}, cdrom={1=1}, url={1=1567, 2=2}, volume={1=2813}, pages={1=10858, 2=5}, month={1=1}, school={1=1510, 2=66}, series={1=6281, 2=3}, publisher={1=13919, 2=1}, cite={643=1, 115=1, 741=1, 421=1, 342=1, 284=1, 156=1, 189=1, 365=1, 63=1}, booktitle={1=1269}}
Book Total: 14923
Website Histogram: {ee={1=1}, editor={1=1, 2=1, 4=2, 5=1, 6=1}, note={1=40917, 2=4810, 3=1168, 4=334, 5=96, 6=32, 7=7, 8=1}, year={1=17}, www={1=2028874}, author={1=1991382, 2=34707, 3=2243, 4=183, 5=19, 6=4, 10=1}, cite={1=85, 2=5, 4=1, 6=1, 30=1}, title={1=2028570}, booktitle={1=1}, crossref={1=304}, url={1=31056, 2=7554, 3=3209, 4=1414, 5=812, 6=487, 7=407, 8=305, 9=233, 10=153, 11=79, 12=47, 13=22, 14=16, 15=4, 16=3, 17=1}}
Website Total: 2028874
FYI, this is the data I parse, which gives more details about the number distribution of each attributes. What confused me is that the total numbers do not match with the result in https://github.com/lvergergsk/BibGallery-FrontEnd/issues/2#issuecomment-375998147
Example CSV file for table ARTICLE
Possible Issues
Reference