openschemas / schemaorg

python functions for applied use of schema.org
https://openschemas.github.io/schemaorg/
Mozilla Public License 2.0
34 stars 3 forks source link

Collection Schema doesn't exist #34

Closed abrin closed 3 years ago

abrin commented 3 years ago

Describe the bug

Hi, I was able to find the definitions for a collection schema in the [csv] (https://raw.githubusercontent.com/openschemas/schemaorg/master/schemaorg/data/releases/7.03/all-layers-types.csv), and JSON-LD files, but when you try and use it, it's reported that it doesn't exist (here's the output of the test I created (below). Not sure I'm missing something though?:

Specification base set to http://www.schema.org
Using Version 7.03
Collection
Did you mean:
CollectionPage
---------------------------------------------------------------------------------------- Captured stderr call ----------------------------------------------------------------------------------------
WARNING /Users/abrin/schemaorg/recipe.yml does not exist.
WARNING /Users/abrin/schemaorg/recipe.yml does not exist.
ERROR Collection is not a valid type!

To Reproduce Steps to reproduce the behavior:

I created a basic test:

from schemaorg.templates.google import make_person
from schemaorg.main.parse import RecipeParser
from schemaorg.main import Schema
import shutil
import os
import unittest
import tempfile

print("######################################################## test_schema")

class TestSchema(unittest.TestCase):

    def setUp(self):
        self.tmpdir = os.path.join(tempfile.gettempdir(), 'schemaorg-test')

        if not os.path.exists(self.tmpdir):
            os.mkdir(self.tmpdir)

    def tearDown(self):
        pass

    def test_collection(self):

        self.here = os.path.abspath(os.path.dirname(__file__))
        recipe_yml = os.path.join(self.here, "recipe.yml")
        self.recipe = RecipeParser(recipe_yml)
        self.collection = Schema("Collection")
        self.collection.add_property('name', 'test')
        self.collection.validate(self.collection)

if __name__ == '__main__':
    unittest.main()

recipe.yml

version: 1
schemas:
  Dataset:
    recommended:
      - softwareVersion: version
      - citation
      - identifier
      - keywords
      - license
      - url
      - sameAs
      - spatialCoverage
      - temporalCoverage
      - variableMeasured
      - includedInDataCatalog
    required:
      - description
      - name
  Person|Organization:
    required:
      - description
      - name
  Collection:
    required:
      - name

thanks Version of Python schemaorg latest

vsoch commented 3 years ago

Thank you for the detailed test script - it made it easy to debug!

So the error isn't wrong - the file we get schema types from is called schema-types.csv, and that does not have a Collection. If I look at the difference between the two, the file you reference has many more:

```python set(layers).difference(set(schema)) Out[15]: {'3DModel', 'Abdomen', 'ActiveNotRecruiting', 'AdvertiserContentArticle', 'AerobicActivity', 'AnaerobicActivity', 'AnalysisNewsArticle', 'AnatomicalStructure', 'AnatomicalSystem', 'Anesthesia', 'Appearance', 'ApprovedIndication', 'ArchiveComponent', 'ArchiveOrganization', 'Artery', 'AskPublicNewsArticle', 'Atlas', 'Audiobook', 'AuthenticContent', 'AuthoritativeLegalValue', 'Ayurvedic', 'BackgroundNewsArticle', 'Bacteria', 'Balance', 'BenefitsHealthAspect', 'BloodTest', 'Bone', 'BrainStructure', 'BrokerageAccount', 'BusOrCoach', 'CDCPMDRecord', 'CT', 'CarUsageType', 'Cardiovascular', 'CardiovascularExam', 'CaseSeries', 'CategoryCode', 'CategoryCodeSet', 'CausesHealthAspect', 'Chapter', 'Chiropractic', 'Claim', 'Class', 'Clinician', 'CohortStudy', 'Collection', 'ComicCoverArt', 'ComicIssue', 'ComicSeries', 'ComicStory', 'CommunityHealth', 'CompleteDataFeed', 'Completed', 'Consortium', 'ContagiousnessHealthAspect', 'CorrectionComment', 'CoverArt', 'CovidTestingFacility', 'CriticReview', 'CrossSectional', 'CssSelectorType', 'DDxElement', 'DefinedTerm', 'DefinedTermSet', 'DefinitiveLegalValue', 'Dentistry', 'Dermatologic', 'Dermatology', 'Diagnostic', 'DiagnosticLab', 'DiagnosticProcedure', 'Diet', 'DietNutrition', 'DietarySupplement', 'DoseSchedule', 'DoubleBlindedTrial', 'Drawing', 'DrivingSchoolVehicleUsage', 'Drug', 'DrugClass', 'DrugCost', 'DrugCostCategory', 'DrugLegalStatus', 'DrugPregnancyCategory', 'DrugPrescriptionStatus', 'DrugStrength', 'Ear', 'EducationalOccupationalCredential', 'EducationalOccupationalProgram', 'Emergency', 'EmployerReview', 'Endocrine', 'EnrollingByInvitation', 'EventAttendanceModeEnumeration', 'EventSeries', 'EvidenceLevelA', 'EvidenceLevelB', 'EvidenceLevelC', 'ExchangeRateSpecification', 'ExchangeRefund', 'ExercisePlan', 'Eye', 'FDAcategoryA', 'FDAcategoryB', 'FDAcategoryC', 'FDAcategoryD', 'FDAcategoryX', 'FDAnotEvaluated', 'Flexibility', 'FloorPlan', 'FullRefund', 'FundingAgency', 'FundingScheme', 'Fungus', 'Gastroenterologic', 'Genetic', 'Genitourinary', 'GeospatialGeometry', 'Geriatric', 'Grant', 'GraphicNovel', 'Guide', 'Gynecologic', 'Head', 'HealthAspectEnumeration', 'HealthInsurancePlan', 'HealthPlanCostSharingSpecification', 'HealthPlanFormulary', 'HealthPlanNetwork', 'HealthTopicContent', 'Hematologic', 'Homeopathic', 'HowOrWhereHealthAspect', 'ImagingTest', 'InForce', 'Infectious', 'InfectiousAgentClass', 'InfectiousDisease', 'InternationalTrial', 'InvestmentFund', 'Joint', 'LaboratoryScience', 'LegalForceStatus', 'LegalValueLevel', 'Legislation', 'LegislationObject', 'LeisureTimeActivity', 'LibrarySystem', 'LifestyleModification', 'Ligament', 'LinkRole', 'LivingWithHealthAspect', 'Longitudinal', 'Lung', 'LymphaticVessel', 'MRI', 'Manuscript', 'MaximumDoseSchedule', 'MayTreatHealthAspect', 'MediaManipulationRatingEnumeration', 'MediaReview', 'MedicalAudience', 'MedicalBusiness', 'MedicalCause', 'MedicalClinic', 'MedicalCode', 'MedicalCondition', 'MedicalConditionStage', 'MedicalContraindication', 'MedicalDevice', 'MedicalDevicePurpose', 'MedicalEntity', 'MedicalEnumeration', 'MedicalEvidenceLevel', 'MedicalGuideline', 'MedicalGuidelineContraindication', 'MedicalGuidelineRecommendation', 'MedicalImagingTechnique', 'MedicalIndication', 'MedicalIntangible', 'MedicalObservationalStudy', 'MedicalObservationalStudyDesign', 'MedicalProcedure', 'MedicalProcedureType', 'MedicalResearcher', 'MedicalRiskCalculator', 'MedicalRiskEstimator', 'MedicalRiskFactor', 'MedicalRiskScore', 'MedicalScholarlyArticle', 'MedicalSign', 'MedicalSignOrSymptom', 'MedicalSpecialty', 'MedicalStudy', 'MedicalStudyStatus', 'MedicalSymptom', 'MedicalTest', 'MedicalTestPanel', 'MedicalTherapy', 'MedicalTrial', 'MedicalTrialDesign', 'MedicalWebPage', 'MedicineSystem', 'MerchantReturnEnumeration', 'MerchantReturnFiniteReturnWindow', 'MerchantReturnNotPermitted', 'MerchantReturnPolicy', 'MerchantReturnUnlimitedWindow', 'MerchantReturnUnspecified', 'Midwifery', 'MisconceptionsHealthAspect', 'MissingContext', 'MixedEventAttendanceMode', 'MonetaryGrant', 'MoneyTransfer', 'MortgageLoan', 'Motorcycle', 'MotorizedBicycle', 'MultiCenterTrial', 'MulticellularParasite', 'Muscle', 'Musculoskeletal', 'MusculoskeletalExam', 'Neck', 'Nerve', 'Neuro', 'Neurologic', 'NewsMediaOrganization', 'Newspaper', 'NoninvasiveProcedure', 'Nose', 'NotInForce', 'NotYetRecruiting', 'Nursing', 'OTC', 'Observation', 'Observational', 'Obstetric', 'OccupationalActivity', 'OccupationalTherapy', 'OfferForLease', 'OfferForPurchase', 'OfferShippingDetails', 'OfficialLegalValue', 'OfflineEventAttendanceMode', 'Oncologic', 'OnlineEventAttendanceMode', 'OpenTrial', 'OpinionNewsArticle', 'Optician', 'Optometric', 'OriginalShippingFees', 'Osteopathic', 'Otolaryngologic', 'OverviewHealthAspect', 'PET', 'PalliativeProcedure', 'PartiallyInForce', 'Pathology', 'PathologyTest', 'Patient', 'PatientExperienceHealthAspect', 'Pediatric', 'PercutaneousProcedure', 'PharmacySpecialty', 'PhysicalActivity', 'PhysicalActivityCategory', 'PhysicalExam', 'PhysicalTherapy', 'Physiotherapy', 'PlaceboControlledTrial', 'PlasticSurgery', 'Play', 'PodcastEpisode', 'PodcastSeason', 'PodcastSeries', 'Podiatric', 'Poster', 'PrescriptionOnly', 'PreventionHealthAspect', 'PreventionIndication', 'PrimaryCare', 'Prion', 'ProductReturnEnumeration', 'ProductReturnFiniteReturnWindow', 'ProductReturnNotPermitted', 'ProductReturnPolicy', 'ProductReturnUnlimitedWindow', 'ProductReturnUnspecified', 'PrognosisHealthAspect', 'Project', 'PronounceableText', 'Property', 'Protozoa', 'Psychiatric', 'PsychologicalTreatment', 'PublicHealth', 'PublicToilet', 'Pulmonary', 'Quotation', 'RadiationTherapy', 'RadioBroadcastService', 'Radiography', 'RandomizedTrial', 'RealEstateListing', 'Recommendation', 'RecommendedDoseSchedule', 'Recruiting', 'RefundTypeEnumeration', 'Registry', 'ReimbursementCap', 'RelatedTopicsHealthAspect', 'Renal', 'RentalVehicleUsage', 'RepaymentSpecification', 'ReportageNewsArticle', 'ReportedDoseSchedule', 'ResearchProject', 'RespiratoryTherapy', 'RestockingFees', 'ResultsAvailable', 'ResultsNotAvailable', 'Retail', 'ReturnFeesEnumeration', 'ReturnShippingFees', 'ReviewNewsArticle', 'Rheumatologic', 'RisksOrComplicationsHealthAspect', 'SatiricalArticle', 'Schedule', 'SchoolDistrict', 'ScreeningHealthAspect', 'SeeDoctorHealthAspect', 'SelfCareHealthAspect', 'SheetMusic', 'ShortStory', 'SideEffectsHealthAspect', 'SingleBlindedTrial', 'SingleCenterTrial', 'Skin', 'SpecialAnnouncement', 'SpeechPathology', 'StagesHealthAspect', 'StatisticalPopulation', 'StoreCreditRefund', 'StrengthTraining', 'StupidType', 'Substance', 'SuperficialAnatomy', 'Surgical', 'SurgicalProcedure', 'Suspended', 'SymptomsHealthAspect', 'TaxiVehicleUsage', 'Terminated', 'Therapeutic', 'TherapeuticProcedure', 'Thesis', 'Throat', 'TouristDestination', 'TouristTrip', 'Toxicologic', 'TraditionalChinese', 'TreatmentIndication', 'TreatmentsHealthAspect', 'TripleBlindedTrial', 'TypesHealthAspect', 'Ultrasound', 'UnofficialLegalValue', 'Urologic', 'UsageOrScheduleHealthAspect', 'UserReview', 'Vein', 'Vessel', 'VeterinaryCare', 'VirtualLocation', 'Virus', 'VitalSign', 'WebAPI', 'WebContent', 'WesternConventional', 'Wholesale', 'Withdrawn', 'WorkBasedProgram', 'XPathType', 'XRay'} ```

including Collection :) The structure of these files seems the same, but I'm not sure what distinguishes the two files. Could you ping folks at schema.org and figure out what the difference is between these two files? If it doesn't hurt to change to use the layers file (since it includes all of the other) then we can update to that. If it's not the right thing to do, then I can make a variable to let you choose the file. I'd also be curious about the other ext files that have types - are they included in the layers file? If we can find the "one file to rule them all" in terms of types, or at least an understanding of the difference, I'd be happy to make a PR to update the package so it works for your use case. Thank you!

abrin commented 3 years ago

Good Morning, I took a closer look at the schema.org site and their releases and noticed that this repo is stopping at 7.03, but they're up to version 12.0. It also looks like they may have deprecated the all-types file (see) because the schema-types file only has the core elements, if I'm reading the comment properly. is it possible to update to use the newer structure?

Thanks,

Adam

vsoch commented 3 years ago

Yes definitely possible. I can possibly get to this on a weekend, and please feel free to do a PR with the new files first if you have the time!

vsoch commented 3 years ago

hey! So I think I can make some time at the end of the work day to fix this up for you - I'll prepare a PR with the new version for you to test.

vsoch commented 3 years ago

Give this a test out! https://github.com/openschemas/schemaorg/pull/35

I'm switching from CircleCI to GitHub actions, so ignore the CI results for now (I disconnected it).

abrin commented 3 years ago

Thanks!

Just tried and I'm getting: ERROR /workspace/schemaorg/schemaorg/data/releases/12.0/schema-types.csv does not exist.

followed the install instructions from:

git clone https://www.github.com/openschemas/schemaorg
cd schemaorg
python setup.py install

Should it be reading schemaorg-current-http-types.csv and schemaorg-current-http-properties.csv instead of schema-properties.csv and schema-types.csv?

I tried making that change, but now I get:

Specification base set to http://www.schema.org
Using Version 12.0
Found http://www.schema.org/Collection
Collection: found 121 properties
Traceback (most recent call last):
  File "test.py", line 13, in <module>
    collection.validate(collection)
AttributeError: 'Schema' object has no attribute 'validate'
vsoch commented 3 years ago

Make sure that you clone the branch in the linked PR (in your example you clone the main repository, which isn't different). And then self.collection.validate I don't believe was a supported function. The validation is done by the recipe parser, e.g.,

recipe = RecipeParser("recipe.yml")

Otherwise, the output looks good - I see that the Collection is found for version 12.0 with 121 properties!

Should it be reading schemaorg-current-http-types.csv and schemaorg-current-http-properties.csv instead of schema-properties.csv and schema-types.csv?

if you take a look at the changed code, you'll see we are using those files, just the https versions.

abrin commented 3 years ago

confirming I'm on the right branch. I'm wondering if schemaorg/data/__init__.py needs to be changed? This is what got me from the previous error to the current error :

abrin@GT29036 schemaorg % git branch    
* add/release-12.0
  master
abrin@GT29036 schemaorg % python test.py
Specification base set to http://www.schema.org
Using Version 12.0
Found http://www.schema.org/Collection
Collection: found 121 properties
Traceback (most recent call last):
  File "test.py", line 13, in <module>
    collection.validate(collection)
AttributeError: 'Schema' object has no attribute 'validate'

Just not sure how to address that. Attached my two test files, which I've pulled out of the test (can't seem to get pytest to run them without a module error, and this seemed faster).

test.zip

thanks

vsoch commented 3 years ago

Sorry I'm not sure if you are hearing me - the Schema object does not have a validate funciton.

vsoch commented 3 years ago

In your example, you would need to do:

collection = Schema('Collection')
recipe = RecipeParser("recipe.yml")
recipe.validate(collection)

Does that make sense?

abrin commented 3 years ago

Yes, that makes sense .Thanks for your patience with me, and apologies for the confusion and miss-understanding I caused. I tested with the correct validation code that I miss-entered. It works, but only with a change to these two lines:

reference to schema-properties in "schemaorg/schemaorg/data/init.py:90" and reference to schema-types.csv in "schemaorg/schemaorg/data/init.py:104"

vsoch commented 3 years ago

@abrin that was my mistake! I had the changes locally and for some reason they didn't push. Please take another look!

abrin commented 3 years ago

That did it. My test works. thank you.

vsoch commented 3 years ago

Great! Apologies for my oversight of not pushing the commits. I was testing different gpg keys yesterday and I think one of my commits just didn't take (and the other to add the new data was so large I didn't notice).

I'll get the PR merged and released asap.

vsoch commented 3 years ago

Fixed with #35.