microsimulation / ijm-xml

XML files for the International Journal of Microsimulation
MIT License
0 stars 0 forks source link

ijm-00003.xml validation #21

Closed gnott closed 5 years ago

gnott commented 5 years ago

I am testing some files from Volume 1. So far, the demo site requires perfectly valid JSON in order for it to be shown. When converting the ijm-00003.xml it is not valid due to the reference list:

@Melissa37 if you could please review this and provide some comments. I do not know whether we can get around the current demo site requiring valid output (as much as valid output is great when it is possible, of course).

I may be able to detect some of the year values and when missing or non-numeric, I might be able to convert the reference from type "book" to "unknown". This might be an alternative to checking and setting every reference correctly.

Melissa37 commented 5 years ago

ijm-00003.xml

All references with a simple tag, I change it to <person-group person-group-type="author">

I am looking at the XML downloaded from Exeter's FTP site and I see all references with <person-group person-group-type="author"> Exeter have resupplied a lot of XML and I only needed to look in their two most recent sub folders to get it all. @Vijayarajmurugesan could you clean up the FTP site and get rid of old and replaced XML?

bib13 has no <year> tag and requires one

<ref id="bib13">
                <element-citation publication-type="book">
                    <person-group>
                        <collab>Ministry of Finances (various years)</collab>
                    </person-group>
                    <source>Fiscaal Memento</source>
                    <publisher-loc>Brussels</publisher-loc>
                    <publisher-name>Ministerie van Financi&#x00EB;n</publisher-name>
                    <ext-link ext-link-type="uri"
                        xlink:href="http://www.docufin.fgov.be/websedsdd/intersalgnl/thema/publicaties/memento/memen to.htm"
                        >http://www.docufin.fgov.be/websedsdd/intersalgnl/thema/publicaties/memento/memen
                        to.htm</ext-link>
                </element-citation>
            </ref>

bib17 <year>various years</year> value is not numeric and it must be for validation

<ref id="bib17">
                <element-citation publication-type="book">
                    <person-group person-group-type="editor">
                        <name>
                            <surname>Put</surname>
                            <given-names>J</given-names>
                        </name>
                    </person-group>
                    <year>various years</year>
                    <source>Praktijkboek Sociale Zekerheid voor de onderneming en de sociale
                        adviseur</source>
                    <publisher-loc>Brussels</publisher-loc>
                    <publisher-name>Ced.Samsom</publisher-name>
                </element-citation>
            </ref>

OK, so this is old content - It cannot be reworked. The url for bib13 is not open and accessible either. I suggest we have to relax the JSON schema to allow bibs without dates. The ideal would be for IJM to ask for a date in citation where there are multiple dates in the future, but I am not sure how Kriya can factor that in?

I think it is preferable to have no <year> than a year element with . content containing <year>various years</year>

Melissa37 commented 5 years ago

To clarify for @Vijayarajmurugesan the only fix required is for bib17 and to remove `various years

gnott commented 5 years ago

I'm seeing <person-group> tags (with no other attributes) on the XML file for article 00003 on the FTP site, so assuming those will be <person-group person-group-type="author"> for the next time I download that would be great, thanks!

I have put together a potential workaround for when there is no year for a reference that will allow me to validate files as part of the process, and the web page view of the reference will look ok. If you could please remove the <year>various years</year> tag, as shown above, that would be good too, thanks.

Melissa37 commented 5 years ago

@Vijayarajmurugesan

I'm seeing tags (with no other attributes) on the XML file for article 00003 on the FTP site, so assuming those will be for the next time I download that would be great, thanks!

Regarding this comment, cany your remove old files that have since been replaced/updated? That wil make it easier for us. Thanks!

Vijayarajmurugesan commented 5 years ago

@Melissa37 I have removed all the old files in the FTP. Now you will get all updated files from there.

Melissa37 commented 5 years ago

@Vijayarajmurugesan Thank you!

Can you confirm re

To clarify for @Vijayarajmurugesan the only fix required is for bib17 and to remove `various years

Thanks!

Melissa

Vijayarajmurugesan commented 5 years ago

@Melissa37 I would like to check with you, can we remove the text "various years" from bib 13 reference too. Thanks, Vijay

Melissa37 commented 5 years ago

I would like to check with you, can we remove the text "various years" from bib 13 reference too.

We did not see various years in bib13, just no year value: bib13 has no tag and requires one - see the XML pasted into the ticket in comment above.

Melissa37 commented 5 years ago

@Vijayarajmurugesan can you confirm the file loaded to the ftp site contains this correction? Can you indicate here in Github when you've resupplied in future?

Thanks!

gnott commented 5 years ago

I converted the XML from ijm-00003-vor-r2.zip, and the <person-group> tags with no type are still there, e.g.

<person-group>
<collab>European Community EC</collab>
</person-group>

@Melissa37 if you could please check these to confirm you are also seeing these?

These <collab> values will not be present in the output if the type of person is not specified in the XML.

Vijayarajmurugesan commented 5 years ago

@Vijayarajmurugesan can you confirm the file loaded to the ftp site contains this correction? Can you indicate here in Github when you've resupplied in future?

Thanks!

@Melissa37 Today, I have created a new folder name "Resupply" and uploaded the updated files. You can now access all the resupply files for the below link:

https://exeterpremedia.exavault.com/share/view/1i5dm-9s6a5s81

Thanks, Vijay

Melissa37 commented 5 years ago

@gnott

I converted the XML from ijm-00003-vor-r2.zip, and the tags with no type are still there, e.g.

<person-group>
<collab>European Community EC</collab>
</person-group>

@Melissa37 if you could please check these to confirm you are also seeing these?

These values will not be present in the output if the type of person is not specified in the XML.

This is what I see in the resupplies today:

<person-group person-group-type="author">
                        <collab>European Community EC</collab>
                    </person-group>
gnott commented 5 years ago

I see <person-group person-group-type="author"> in the latest resupply of this article, thanks!

I will remove the code change I made to account for basic <person-group> tags, and it will fail on any articles that have any remaining.

gnott commented 5 years ago

Converted this article and all that is reported here is good.

The pub date is still April 30, which should be December 31, otherwise it is correct.

@Melissa37 should I close this, because the file is valid?

Melissa37 commented 5 years ago

@gnott sure, fine to close this. The date changes required are captured in ticket #22 so @Vijayarajmurugesan will get to them there :-)