okfn / ckanext-tsbsatellites

repo for Satellites Catapult's CKAN
http://data.satapps.org/
GNU Affero General Public License v3.0
3 stars 3 forks source link

Dataset URIs #14

Closed adamamyl closed 10 years ago

adamamyl commented 10 years ago

On a dataset, the URI goes to a domain, not a specific doc, e.g., on

http://185.30.10.28:8081/dataset/bilsat-1/resource/713f82be-40f7-41ac-8cfd-d1d8fb213cb2

also, no preview is available. Is the source wrong, or is the parser doing something? — the aim would be to link to the doc (and preview it).

amercader commented 10 years ago

I think this is an issue with the original metadata documents, but I can't confirm as I can't access the staging site.

Looking and the sample file that we were sent, I suspect that the links to the remote resources are not defined correctly in the ISO docs:

  <gmd:distributionInfo>
      <gmd:MD_Distribution>
         <gmd:distributionFormat>
            <gmd:MD_Format>
               <gmd:name>
                  <gco:CharacterString>GeoTIFF</gco:CharacterString>
               </gmd:name>
               <gmd:version>
                  <gco:CharacterString>1.0</gco:CharacterString>
               </gmd:version>
            </gmd:MD_Format>
         </gmd:distributionFormat>
         <gmd:distributor>
            <gmd:MD_Distributor>
               <gmd:distributorContact>
                  <gmd:CI_ResponsibleParty>
                     <gmd:organisationName>
                        <gco:CharacterString>Astrium GEO-Information Services</gco:CharacterString>
                     </gmd:organisationName>
                     <gmd:contactInfo>
                        <gmd:CI_Contact>
                           <gmd:address>
<!-- ... -->
                           </gmd:address>
                           <gmd:onlineResource>
                              <gmd:CI_OnlineResource>
                                 <gmd:linkage>
                                    <gmd:URL>http://www.astrium-geo.com/en/143-spot-satellite-imagery</gmd:URL>
                                 </gmd:linkage>
                                 <gmd:description>
                                    <gco:CharacterString>Spot Imagery</gco:CharacterString>
                                 </gmd:description>
                              </gmd:CI_OnlineResource>
                           </gmd:onlineResource>
                        </gmd:CI_Contact>
                     </gmd:contactInfo>
                     <gmd:role>
                        <gmd:CI_RoleCode codeList="http://standards.iso.org/ittf/PubliclyAvailableStandards/ISO_19139_Schemas/resources/codelist/ML_gmxCodelists.xml#CI_RoleCode"
                                         codeListValue="distributor"/>
                     </gmd:role>
                  </gmd:CI_ResponsibleParty>
               </gmd:distributorContact>
            </gmd:MD_Distributor>
         </gmd:distributor>
         <gmd:transferOptions>
            <gmd:MD_DigitalTransferOptions>
               <gmd:onLine>
                  <gmd:CI_OnlineResource>
                     <gmd:linkage>
                        <gmd:URL>http://www.astrium-geo.com/geostore/</gmd:URL>
                     </gmd:linkage>
                     <gmd:description>
                        <gco:CharacterString>Airbus Defence and Space Store</gco:CharacterString>
                     </gmd:description>
                  </gmd:CI_OnlineResource>
               </gmd:onLine>
            </gmd:MD_DigitalTransferOptions>
         </gmd:transferOptions>
      </gmd:MD_Distribution>
  </gmd:distributionInfo>

Looks like the actual resource (http://www.astrium-geo.com/en/143-spot-satellite-imagery) is linked inside the contactInfo section of the distributor section, and not as one of the transferOptions which is where the parsers look for it.

For reference, the Xpaths used to extract the resources of a dataset are:

"gmd:distributionInfo/gmd:MD_Distribution/gmd:transferOptions/gmd:MD_DigitalTransferOptions/gmd:onLine/gmd:CI_OnlineResource",
"gmd:distributionInfo/gmd:MD_Distribution/gmd:distributor/gmd:MD_Distributor/gmd:distributorTransferOptions/gmd:MD_DigitalTransferOptions/gmd:onLine/gmd:CI_OnlineResource"

When the site is back online I'll double check with this particular linked dataset

adamamyl commented 10 years ago

@amercader the site's up again (issue with BT, now resolved)

adamamyl commented 10 years ago

This one's a little tricky; have spoken with S&D, and the preferred resolution for this would be if we can trawl through a maximum of five 'definitions' (I've forgotten XML terminology), checking each one in turn to see if the URI is a resource that could be previewed (and if that's the case, to use that URI as the source). If that fails, can we fall-back to an image/placeholder text.

Is this possible? Maybe within a couple of days of engineering time?

nigelbabu commented 10 years ago

I'll check with Adria.

amercader commented 10 years ago

I've modified the extension harvester to identify any URL with no format as html, so at least we get an iframe. We need to review the results to see if they look right. The example mentioned won't work because the embedded website has weird initialization with iframes as well.

adamamyl commented 10 years ago

http://185.30.10.28:8081/dataset/beijing-1/resource/0f6162ed-f4da-440c-a88b-55bc1fd8976a is a nice example