Convert license.txt to JSON

samreid commented 9 years ago

From https://github.com/phetsims/tasks/issues/274, @pixelzoom suggested converting license.txt to JSON:

Design (15+ hours)

Replace license.txt with license.json
Define the fields in license.json, which are required/optional, etc.
Require a license field for 3rd-party resources

File conversion (10-20 hours)

Convert all existing license.txt files to license.json (shell script).
Review all license.json files, add missing information (manual process).

Proposed format for license.json, example:

{
  "phetImage.png": {
    "source": "PhET",
    "author": "John Doe",
    "notes": "optional notes"
  },
  "thirdPartyImage.png": {
    "source": "Organization of individual(s) who own the resource",
    "url": "the URL for the organization or individual(s)",
    "license": "the license name",
    "licenseURL": "www.somesite.com/license.txt",
    "notes": "optional notes"
  }
}

Description of fields:

Required for all entries:

source: Tells us whether the resource is owned by "PhET" or a 3rd-party. For 3rd-parties, this field identifies the organization or individual(s) who own the resource.

Required for all entries where source === "PhET":

author: Identifies the individual(s) that created the resource.

Required for all entries where source !== "PhET":

url: The URL of the organization or individual(s) that own the resource. If there is no URL, put "none".

license: Identifies the official name of the license under which PhET is using the resource, e.g., "The MIT License".

licenseURL: URL to the specific license. If there is no URL, put "none".

Optional for all entries:

notes: Optional for all entries, misc notes about the resource.

samreid commented 9 years ago

If we choose a style of JSON that matches the 3rd party code contributions, it will be a good step toward #180 unifying code/art 3rd party support.

samreid commented 9 years ago

I'll have to switch to another task for a bit, but thought I'd leave some notes here. For converting txt=>json, I was writing Java code in AnnotationParser.java (from our svn) like so:

    public static void visit( File file ) {
        if ( file.isDirectory() ) {
            File[] fileList = file.listFiles();
            for ( File file1 : fileList ) {
                visit( file1 );
            }
        }
        else {
            if ( file.getName().equals( "license.txt" ) ) {
                System.out.println( "Found license.txt file at: " + file.getAbsolutePath() );
                try {
                    String s = FileUtils.loadFileAsString( file );
                    StringTokenizer st = new StringTokenizer( s,"\n" );
                }
                catch( IOException e ) {
                    e.printStackTrace();
                }
            }
        }
    }

    public static void main( String[] args ) {
        Annotation a = AnnotationParser.parse( "test-id name=my name age=3 timestamp=dec 13, 2008" );
        System.out.println( "a = " + a );

        File root = new File( "/Users/samreid/github" );
        visit( root );
    }

Obviously we'll need to add a bit more code here :smiley:

samreid commented 9 years ago

Incremental progress in https://phet.unfuddle.com/a#/projects/9404/repositories/23262/commit?commit=74402

samreid commented 9 years ago

I'm going to temporarily disable the plugin license.txt requirements while I am incrementally porting TXT => JSON

samreid commented 9 years ago

To maintain history, it looks like I will need to do this as a 2-step process, converting the contents of the file before changing the filename.

I'll also want to make sure the process for handling the JSON works well before getting too far so I can make sure we won't require systemic changes after this batch conversion.

samreid commented 9 years ago

After I'm done here, I should double check the projectURL and text fields for 3rd party images & audio.

EDIT: done

samreid commented 9 years ago

I've finished converting license.txt to license.json. A summary of what was done:

Reused JSON schema from the 3rd party code contributions, so they will match. For example, here is an annotated public domain image:

  "cement-texture-dark.jpg": {
    "text": [
      "Public Domain"
    ],
    "projectURL": "http://www.public-domain-image.com/full-image/textures-and-patterns-public-domain-images-pictures/concrete-texture-public-domain-images-pictures/cement-texture.jpg-royalty-free-stock-image.html",
    "license": "Public Domain",
    "notes": ""
  }

and here is an annotated PhET image:

  "explore-icon.png": {
    "text": [
      "Copyright 2002-2015 University of Colorado Boulder"
    ],
    "projectURL": "http://phet.colorado.edu",
    "license": "contact phethelp@colorado.edu",
    "notes": "created by John Blanco"
  }

The text states the copyright, if any. The projectURL is where the image/audio was obtained from. If it is a PhET image/audio, this must read http://phet.colorado.edu The license specifies the license. Notes gives any additional or helpful information, such as whether the image was modified from another source, who created it, etc.

I think the only thing left to do before closing this issue is to run grunt-all.sh and see what problems come up.

samreid commented 9 years ago

I ran grunt-all.sh build-no-lint and saw related errors in:

john travoltage making-tens build-a-molecule

samreid commented 9 years ago

I resolved the John Travoltage issue and the Snow Day Math issue (see https://github.com/phetsims/making-tens/issues/25). @jonathanolson still needs to comment on the Build a Molecule issue but this is tracked in https://github.com/phetsims/build-a-molecule/issues/75 so I'm ready to close this issue.

pixelzoom commented 9 years ago

Reopening.

The description of fields in the first comment of this issue says:

notes: Optional for all entries, misc notes about the resource.

If notes is options, then why do we have 111 occurrences of "notes": "" in license.json files?

If notes is optional, then the implementation doesn't reflect that. Specifically line 149 of createThirdPartyReport.js:

    var lines = [
      '**' + library + '**',
      json[ library ].text.join( '<br>' ),
      json[ library ].projectURL,
      'License: [' + json[ library ].license + '](licenses/' + library + '.txt)',
      'Notes: ' + json[ library ].notes
    ];

pixelzoom commented 9 years ago

Where are the fields in license.json documented (other than in the first comment of this issue)? Should they should be documented in the header comment of createThirdPartyReport.js? Or some other grunt taks?

samreid commented 9 years ago

I decided to make "notes" required to simplify the json schema and to encourage people to add notes for new images & audio.

The fields are documented in https://github.com/phetsims/simula-rasa/blob/master/images/README.txt

I chose that location since it is the template that ends up "creating" most of the images directories. Where do you recommend to put this information?

Here's a copy of the README.txt for reference:

Image files belong in this directory. Each image must have an entry in license.json which indicates the origin of the image as well as its licensing. If this directory has subdirectories, each subdirectory mut have its own license.json file.

The license.json file should contains one entry per file, and each should be annotated with the following:

text - copyright statement or "Public Domain"
projectURL - the URL for the resource
license - the name of license, such as "Public Domain"
notes - additional helpful information about the resource, or ""

For an example, please see any of the license.json files in a PhET simulation's image directory.

pixelzoom commented 9 years ago

Where do you recommend to put this information?

Recommended to put it in the grunt task that reads it. See for example setThirdPartyLicenses.js, which describes the format of sherpa/lib/license.json.

samreid commented 9 years ago

The primary file that uses it in chipper is getLicenseInfo.js, and there is already a reference to https://github.com/phetsims/simula-rasa/blob/master/images/README.txt in there:

/* 
 * The classification is one of: missing-license.json, not-annotated, phet or third-party
 * isProblematic indicates whether the particular license is compatible with PhET's licensing
 * entry: the object that appears in the license.json file, see
 * https://github.com/phetsims/simula-rasa/blob/master/images/README.txt
 */

getLicenseInfo is not a grunt task, but a utility called by createImageAndAudioLicenseReport, and also used by createSimSpecificThirdPartyReport. I'm not a fan of duplicating this documentation in >1 place, can you help me determine where it should live specifically?

pixelzoom commented 9 years ago

Problems with relying on documentation in simula-rasa/images/README.md:

(1) The description of license.json pertains to all media types, not just images.

(2) Suppressing the propagation of README.md (https://github.com/phetsims/simula-rasa/issues/5) is additional work.

(3) Until propagation of the REAMDE.md is suppressed, copies of this file will proliferate - effectively duplicating the documentation.

(4) I would be unlikely to go looking for this info in simula-rasa/images/README.md. If that file needs to be referenced in the getLicenseInfo.js documentation, why not just put the documentation in getLicenseInfo.js? That would eliminate problems (1), (2) and (3) above.

samreid commented 9 years ago

I deleted simula-rasa/images/README.md in https://github.com/phetsims/simula-rasa/issues/5 and moved the documentation to getLicensingInfo.js in e4e9f2eee42ed0d17ef0fd7e88bf7f500a2d11f5

@pixelzoom can you take a look at your convenience?

pixelzoom commented 9 years ago

:+1: Closing.

I tweaked a few things in the license.json doc, @samreid review if you'd like.

samreid commented 9 years ago

The tweaks look good to me.

phetsims / chipper

Convert license.txt to JSON #181

Here's a copy of the README.txt for reference: