platformio / platformio-libmirror

PlatformIO libraries mirror
Apache License 2.0
16 stars 25 forks source link

"library.json" for mbed libs #10

Open ivankravets opened 9 years ago

ivankravets commented 9 years ago

Hey @gandy92 and @valeros ,

We have announced mbed support earlier. However, the support of mbed libs will be great! This can be done parsing mbed web-results:

http://developer.mbed.org/code/

[Code] -> [Most popular code] -> [Filter by type: Llibrary]. The final url:

http://developer.mbed.org/search/?q=&selected_facets=obj_type_exact%3ACode+Repository&repo_type=Library&order_by=-import_count

Here we have:

For example http://developer.mbed.org/users/simon/code/TextLCD/

{
    "name": "TextLCD",
    "description" :  "TextLCD library for controlling various LCD panels based on the HD44780 4-bit interface",
    "keywords": "HD44780, TextLCD",
    "authors":
    {
        "name": "Simon Ford",
        "url": "http://developer.mbed.org/users/simon/"
    },
    "repository":
    {
        "type": "hg",
        "url": "http://developer.mbed.org/users/simon/code/TextLCD/"
    },
    "frameworks": "mbed",
    "platforms":
    [
        "freescalekinetis",
        "nordicnrf51",
        "nxplpc",
        "ststm32"
    ]
}

Then we can hosts these library.json in our platformio-libmirror/mbed repository.

I would be GREAT additional to not jsut PlatformIO, but also for all embedded community.

Friends, what do you think?

gandy92 commented 9 years ago

I just had a brief first look at the search query results and how to extract project urls. Apparently, the mbed code repository is organized in teams and groups, forming urls like the following:

/users/simon/code/SRF05/
/users/tlunzer/code/TextLCD/
/teams/Bluetooth-Low-Energy/code/BLE_API/
/users/simon/code/Terminal/
/users/erik_kedo/code/NewTextLCD/
/users/SomeRandomBloke/code/SDFileSystem/
/teams/Nordic-Semiconductor/code/nRF51822/
/users/chris/code/MSCFileSystem/
/users/AjK/code/DebounceIn/

From this I extracted results from the first 100 pages, with 20 results per page. With no limitations on queries per minute, this can be done in a short while.

I did not yet start parsing the library pages but from what I can see we have access to additional information:

Lets consider we find all the information we need to automagically fill in the library.json file. How to proceed?

Please comment

ivankravets commented 9 years ago

Thanks for great review.

Dependencies (see http://developer.mbed.org/users/embeddedartists/code/EALib/); we may need to parse another page if we see a link to "more..."

Great! We should parse and use it too.

Platforms? There is some button "Export to desktop IDE" providing dropdown lists for platform (always empty) and toolchain (appears to always be the same list)

This data are obsolete and don't show appropriate information. We will not use them.

Examples seem to be missing, but there are Dependents which use the library

Hm. Look fine, we can use these "Dependents" like "examples" (maximum 10 examples ordered by "imports") which will contain "URLs" to lib-examples. For example:

Lib: http://developer.mbed.org/users/simon/code/TextLCD/ Dependents: http://developer.mbed.org/users/simon/code/TextLCD/dependents

{
   "name": "TextLCD",
   "examples":
   [
       "http://developer.mbed.org/users/simon/code/TextLCD/",
       "http://developer.mbed.org/users/Schueler/code/analog_test/",
      "http://developer.mbed.org/users/ytsuboi/code/AVR_standalone_writer/",
      ....
   ]
}

Lets consider we find all the information we need to automagically fill in the library.json file. How to proceed?

I agree with your steps.There is only the one proposition - parse libs with dependents "recursively". This should fix your latest step.

gandy92 commented 9 years ago

Thank you for your feedback, you are absolutely right about the last point, it only occured to me later.

I am still uncertain about how to fill in the platform list in the library manifest file. As far as I see it, mbed is all about different boards and platforms, much like the arduino ecosystem and all the others. There should be libraries built upon some abstraction layers making them usable on most platforms, but others may be too tightly knit to specific platforms or boards to be of general use. Or am I missing an important point, here? Maybe the platforms need to be filled in manually, in which case automated tests would be brilliant (I know there are open issues covering this, but then we probably would not need this right from the start)

There are a few things to consider. Bash scripts worked fine so far but this may not necessarily be the case for the task at hand. Perl may do the job better. Also, extracting the information from the webpages with grep (or regexp/html parser when using perl) tends to break easily when the html code layout changes. I assume we do not want to stop importing libraries after the first run, and keep administration of the import scripts at a minimum, right :smile: Access to the mbed database would solve these issues, but I havent found any mention of an API, yet. Anyway, I will start with a few tests and see where that takes me.

ivankravets commented 9 years ago

I am still uncertain about how to fill in the platform list in the library manifest file.

It's simple! :blush: Just use manually filled fields which are based on http://docs.platformio.org/en/latest/frameworks/mbed.html

{
   "frameworks": "mbed",
   "platforms": 
   [
       "freescalekinetis",
       "nordicnrf51",
       "nxplpc",
       "ststm32"    
   ]
}

There are a few things to consider...

I don't have experience with perl, bash is good for me like for "end-user". To my mind, perl is winner for this task. However, how about python and http://scrapy.org ? :smile: It should save you a lot of free time.

Access to the mbed database would solve these issues, but I havent found any mention of an API, yet.

I don't know why they don't want to open an API for end-developers. We have a lot of companies which shout that "we are open-source", but in practice it's looks differently. Maybe, no one have asked them about API :smile:

gandy92 commented 9 years ago

My first scrapy script now returns items of the kind

        {'authors': [u'Simon Ford'],
         'description': [u'\nTextLCD library for controlling various LCD panels based on the HD44780 4-bit interface\n'],
         'frameworks': 'mbed',
         'keywords': [u'HD44780', u'TextLCD'],
         'name': [u'TextLCD'],
         'platforms': ['freescalekinetis', 'nordicnrf51', 'nxplpc', 'ststm32'],
         'repository': [u'/users/simon/code/TextLCD/']}

All taken from the search result or filled in with static values

What I now need to figure out is

ivankravets commented 9 years ago

Wow! Looks good as for Python beginner :+1: :blush:

Could I ask you to share this code somewhere?

educe lists with only one entry from ['A'] to 'A'

print my_list[0]  # - first item

print ", ".join(my_list)  #  join all items to string

remove spurious \n from strings

my_str.strip()

how to include results from calling the %repo_url%/dependencies webpage

I described above:

{
   "name": "TextLCD",
   "examples":
   [
       "http://developer.mbed.org/users/simon/code/TextLCD/",
       "http://developer.mbed.org/users/Schueler/code/analog_test/",
      "http://developer.mbed.org/users/ytsuboi/code/AVR_standalone_writer/",
      ....
   ]
}

Please include links to "libs/exmaples" which use the root lib.

how to transform the 'repository': [u'/users/simon/code/TextLCD/'] to 'repository': { "type" = "hg", "url" = "/users/simon/code/TextLCD/"]

my_dict = {
 'repository':  ['/users/simon/code/TextLCD/'] 
}

my_dict["repository"] =  { "type" = "hg", "url" = "http://developer.mbed.org/users/simon/code/TextLCD/"}

maybe remove the u in front of all the string constants?

No, no, no! :smile: You should not bother with it. Just dump your dictionary to JSON file: https://docs.python.org/2/library/json.html

import json

my_dict = {
    "name": "TextLCD",
    "description" :  "TextLCD library for controlling various LCD panels based on the HD44780 4-bit interface",
    "keywords": "HD44780, TextLCD",
    "authors":
    {
        "name": "Simon Ford",
        "url": "http://developer.mbed.org/users/simon/"
    },
    "repository":
    {
        "type": "hg",
        "url": "http://developer.mbed.org/users/simon/code/TextLCD/"
    },
    "frameworks": "mbed",
    "platforms":
    [
        "freescalekinetis",
        "nordicnrf51",
        "nxplpc",
        "ststm32"
    ]
}

with open("/path/to/library.json", "w") as f:
    json.dump(my_dict, f, indent=4)
gandy92 commented 9 years ago

My learning curve mostly involved xpath, the python part was straight forward :smile:

I'd like to do as much as possible in the context of the scrapy framework, what I need to figure out is mostly about finding the right hooks and class interactions, like input/output processors etc.

Also, resolving the extracted urls to names of already registered PlatformIO libs is necessary for

  1. determining if a lib in the top-list is already registered and can be omitted
  2. resolving library names in the dependency graph and determining if one of those required registering

Before sharing the code I'll try to achieve the following:

I am trying to work towards a solution that is suitable for automation, at least to an extent where the script collects its data and suggests a (managable) number of autogenerated manifest files that are ready for registration and that will, once registered, not contain inconsistencies. The first milestone however will be to prepare the information and have a few intermediate steps done manually.

gandy92 commented 9 years ago

just committed and pushed a new version of the spider. It starts with the search request, from which it extracs the first 20 search results. Up to urls_max of these are one by one requested, the respective response being transformed into an internal proto-manifest. Subsequent requests for subpages dependents/ and dependencies/ are used to fill in examples and dependencies, respectively.

For each dependency, additional requests are created for their project pages to generate also the dependencies ljbrary manifest files.

In the latest test run, the top 5 mbed libraries yielded 9 manifest files, all stored in configs/mbed/ for review.

Please check and comment.

I propose the following steps for commencing mbed library registration:

For the last step I see two options:

Please comment

ivankravets commented 9 years ago

filter the list of top libraries to remove all libraries already registered with the PlatformIO library manager

I propose to ignore this checking. Please don't mix "PlatformIO Library Registry" and "platformio-libmirror". In "platformio-libmirror" we will keep library.json files which are not available/ready for the specified libs. Then we will decide which libs should be registered in PIO Registry.

Please check and comment. I propose the following steps for commencing mbed library registration:

To my mind, this should be done manually. Initially, we have to verify all libs and register manually using platformio lib register. If you don't have time for it, we will ask @valeros. I hope I also can find time for a few dozens of libs.

ivankravets commented 9 years ago

In the latest test run, the top 5 mbed libraries yielded 9 manifest files, all stored in configs/mbed/ for review.

  1. We wont register MBED CORE libs in PIO Library Registry. They are prebuilt with framework-mbed by default. The problem is linked with compilation process. It has specified approaches for the each core lib. However, we can keep library.json for the core libs in our repo. Just skip them on "register" step.
  2. Let's discuss your first non-core lib: https://github.com/platformio/platformio-libmirror/blob/master/configs/mbed/TextLCD_SimonFord.json

An information looks great! I don't have any objections here.


Do we have any non-discussed questions?

gandy92 commented 9 years ago

Ok, so let's focus on the mbed core libraries for a minute: how do I identify them? possible approaches:

we can of course keep manifests for the core libraries and not register them, but what is the point to that?

and: do we want to list a core library as a dependency or not?

Depending on how we want to handle core libs, I will modify the script and then I will start to register the first few libs to see how things work out.

ivankravets commented 9 years ago

Ok, so let's focus on the mbed core libraries for a minute: how do I identify them?

We should not identify libs. We should generate library.json for the all TOP libs. "libmirror" repo exists for the all people/developers, not just for PlatformIO Library Registry. If someone decides to implement own library manager, he can use our "library.json" files. We will register manually all libs excluding related to core.

Ok, let's imagine that we don't have "CORE" libs.

we can of course keep manifests for the core libraries and not register them, but what is the point to that? This is for the future. Maybe, someone will decided to implement own library manager within IDE or own cross-platform builder which will handle these core libs. If is difficult to parse these core libs, please skip them.

do we want to list a core library as a dependency or not?

PlatformIO source code builder automatically parses mbed core libs and include them. See example https://github.com/platformio/platformio/tree/develop/examples/mbed/mbed-rtos I propose to don't include them in library.json

I will start to register the first few libs to see how things work out.

Hm... Could I ask you to wait with mbed library registration? Sorry, I've not implemented "examples parser" and hg repo yet :cry: . I mean, that "PlatformIO Library Crawler" isn't ready for mbed-based library.json files.

However, we you can generate first 100 or more libs and we will validate them manually.

P.S: I will implement mbed-support for crawler as soon as possible.

gandy92 commented 9 years ago

We should not identify libs. We should generate library.json for the all TOP libs. "libmirror" repo exists for the all people/developers, not just for PlatformIO Library Registry. If someone decides to implement own library manager, he can use our "library.json" files. We will register manually all libs excluding related to core.

Agreed, but still this leaves the question: How do I (or the script) know, which library is core and which not? Especially when it comes to excluding them from the dependencies list:

I propose to don't include them in library.json

I can get an answer to the question by installing framework-mbed and looking at the libs directory and try to make a list that I can use as a blacklist when registering libraries.

How about this:

PlatformIO source code builder automatically parses mbed core libs

So there is some sort of identifying core libs mechanism. But this does not necessarily be with libmirror, I see that now.

Hm... Could I ask you to wait with mbed library registration? Sorry, I've not implemented "examples parser" and hg repo yet :cry: . I mean, that "PlatformIO Library Crawler" isn't ready for mbed-based library.json files.

Absolutely, so let's try to manually validate the files. Let's try to follow a step-by-step approach where we validate

However, we you can generate first 100 or more libs and we will validate them manually.

Would manual validation include some sort of parser/lexical analyzer that can validate the json structure, maybe more?

ivankravets commented 9 years ago

Agreed, but still this leaves the question: How do I (or the script) know, which library is core and which not? Especially when it comes to excluding them from the dependencies list

See this list https://github.com/platformio/platformio/blob/develop/platformio/builder/scripts/frameworks/mbed.py#L59:L66

How about this We include all dependencies we can find in the manfest files and do not care if it is a PIO core library or not (so we don't have to identify anything nor have to decide at this point)

How is difficult to ignore "core libs" from dependencies list? The work with "mbed" framework supposes that "core libs" are automatically included in build process. This behaviour has "mbed online" compiler. We should preserve the same behaviour and keep mbed-based example in work state within PlatformIO.

The PIO library manager will decide on its own how to treat a dependency: If it knows the dependency is a core library it does nothing, otherwise it will install the dependency library.

PIO Library Manager (which is used for lib installation) should not bother with any "core libs" or other "our" rules. It should base on library.json and dependencies field. If we put to library.json information that library dependents on "core lib" which is not registered in PIO Library Registry then user will have problems with lib installation.

This has the advantage that no manifest file has to be changed if a library becomes a core library. Another advantage: manifest files can be used by non-PIO software where the collection of core libraries may be different from that of PIO.

I agree with you, but this isn't mine requirements. Please look into mbed examples and you will see that they contain just single line #include <mbed.h> which supposes that we should include core mbed libs and etc.

Please install any mbed platform and look into "framework-mbed" package. I hope you will find the all answers there. Also, please look into PlatformIO framework-mbed builder.

Would manual validation include some sort of parser/lexical analyzer that can validate the json structure, maybe more?

If you use Python's dict type and json.dump(), then you will have correct file structure. Please correct me if I don't understand your question.

gandy92 commented 9 years ago

Thank you for pointing this out, I guess I have all the information I need to go on.

just to summarize this discussion:

ivankravets commented 9 years ago

just to summarize this discussion:

Sure! Thanks in advance!

gandy92 commented 9 years ago

Hi, I committed the new version and generated manifest files for the top 10 mbed libs (not counting those in mbed-core) for review.

ivankravets commented 9 years ago

Hi @gandy92,

We've started work on MBED Crawler for library.json. I hope the first implementation will be published in the next week.

Great job! :+1: Thanks! :beer:

ivankravets commented 9 years ago
  1. I've just added teensy platform to mbed-supported list. Please re-run Scrapy.
  2. I see that there are libs which don't contain keywords field. What we will do with these libs? Accordingly to library.json this field is required.
gandy92 commented 9 years ago

Great, thanks, :beer:

  1. I'll extend the list of platforms in the spider accordingly and re-run
  2. Interestingly, some of the manifest files lost keywords during the last run, I suspect a glitch in the xpath statements in the spider code. haven't yet have a chance to look into this, though. this is how I extract keywords: after parsing the library page, I request the owners page where all projects are listed with keywords. results from that page are cached, so another possible glitch could occur in the caching code. I'll check.
ivankravets commented 9 years ago

Thanks you too! I'm going to register a few mbed-libs today :) I'm testing them on local machine now. I will report here when mbed-libs are added.

ivankravets commented 9 years ago

@gandy92 awesome job!!!!!! Thanks!!!

http://platformio.org/#!/lib/search?query=framework%253Ambed&page=1

I propose to start registering of mbed libs which have understandable library.json.

gandy92 commented 9 years ago

nice :+1:

I temporarily deactivated the caching mechanism and took a closer look at manifest files without keyword. This is what's happening:

ivankravets commented 9 years ago

lists BLE_API which has no keywords attached. Question is, what else should we use for keywords? Thing is, I have no idea where else to scrape keywords for a library. At least I could not find any ...

I propose to add functionality to your parser, which will check https://github.com/platformio/platformio-libmirror/tree/master/configs/mbed/moderation/ for the pre-moderated library.json by us. For example, is some lib has invalid library.json then we will move it to moderation folder and correct manually. Like, /configs/mbed/moderation/BLE_API_Bluetooth-Low-Energy.json or etc.

I can try to extend to multiple pages

Will be thankful!

ivankravets commented 9 years ago

I'm registering the first few libs now I've found 2 issues:

  1. Could I ask you to replace \n with 1 space and then multiple spaces by 1 space (when multiple \n are detected). Like, with regexp:
import re
description = re.sub(r"\s+", " ", description.reaplce("\n", " ")).strip()

Examples: https://github.com/platformio/platformio-libmirror/blob/master/configs/mbed/BMP085_Sugakoubou.json https://github.com/platformio/platformio-libmirror/blob/master/configs/mbed/ChaNFSSD_StefanMueller.json

  1. If lib doesn't have required fields keywords or description - automatically put into moderation folder?
  2. How is difficult to implement things like with GitHubTOP libs where we can see which library are register from /mbed folder?

P.S: I've just registered first 65 libs from MBED list. The other 35 libs require moderation.

gandy92 commented 9 years ago

Thank you for the feedback! I just updated the code and improved a few manifest files. Multiple pages are now parsed when searching for keywords. description is not yet being cleaned up, this will be next

I will also try to separate files into moderation folder.

A script like for github-top should not be too difficult, I'll look into that.

ivankravets commented 9 years ago
  1. Please take a look into examples field in https://raw.githubusercontent.com/platformio/platformio-libmirror/master/configs/mbed/NewTextLCD_ErikKerger.json
  2. How about to count the length of description? If it greater then 255, this config should be moderated.
gandy92 commented 9 years ago
  1. Amazing, I'll look into that
  2. For starters, I tried a slightly different approach with commit 0f4d42b. Please review carefully as it also affects a few potentially moderated files. Revert if you don't agree. I do think however, it eliminates most unwanted contents in the description (urls, notes, etc)
gandy92 commented 9 years ago
  1. Fixed in commit bc7e071, this also fixed another comment field: https://raw.githubusercontent.com/platformio/platformio-libmirror/master/configs/mbed/MMA7361L_HiroshiYamaguchi.json
ivankravets commented 9 years ago

For starters, I tried a slightly different approach with commit 0f4d42b.

Thanks, now it looks great!


I see a few issues :( We have moderation folder where spider moves "non valid libs". However, how about the changes which were made manually by human? I edited yesterday a few libs: b24e6db6a8ebf3be9934c1bc59cb49662b922543 and d6883158944ee8d291831b9c2373b6b81709b2f4. I see that the script has removed my changes :(

What do we want in this point?

  1. We can check if "invalid" lib already exists in moderation folder then we will ignore it and will not override it.
  2. Or, we can keep in moderation folder moderation/LibName_Author.json. These jsons will be updated on each time when you run script. But human will copy moderation/LibName_Author.json into moderation/LibName_Author_moderated.json and edit it. This will allow to be up to date with the "invalid" libs, because in future the owner of lib can fix it and we can switch to original JSON. As well, it allows to keep both versions of libs in moderation folder.

WDYT?

ivankravets commented 9 years ago

And yes, we will register moderation/LibName_Author_moderated.jso with PlatformIO Library Registry.

gandy92 commented 9 years ago

Sorry for the inconvenience, I did not think about that. But yes, it's a good point, we need a workflow where the automated scripts do not interfere with manual work.

I'm afk till sunday, but I'll try to think about this in the meantime.

ivankravets commented 9 years ago

Ok, that let's follow my proposition in 2. Good? I will move edited libs to moderation/LibName_Author_moderated.json.

I'm afk till sunday

Have a nice weekend! :beer:

ivankravets commented 9 years ago

Hey @gandy92 :smile:

I've just found an other issue, see commit 750a058c78cc389c39bd9d828aba97bcfabafcd4

You removed ChaNFSSD_StefanMueller.json config which was registered in Library Registry. Maybe we will not remove previous generated configs? I understand that they change regularly in TOP list. Because I'm not sure that I should remove these libs which were moved from TOP 100 to TOP 200.

WDYT?

ghost commented 9 years ago

this might not be the right place, but I don't see any handling of mbed module.json extraIncludes directive as seen here: https://developer.mbed.org/teams/Nordic-Semiconductor/code/nRF51822/file/ca9c9c2cfc6a/module.json. What should be done for such libraries?

ghost commented 9 years ago

I'm asking, because I think it should be transformed to includes, unless I'm totally wrong about library.json