rafalcieslak / emacs-company-terraform

Company backend for terraform files
zlib License
37 stars 11 forks source link

generate.py needs to be updated for new documentation formats #12

Open aibou opened 4 years ago

aibou commented 4 years ago

Hi,

Current company-terraform-data.el does not include new or updated AWS Services such as aws_eks_cluster resource, mixed_instances_policy parameter in aws_autoscaling_group resource.

So I tried to execute utils/generate.py to refresh my company-terraform-data.el but It couldn't. That script causes this error:

$ python utils/generate.py
Traceback (most recent call last):
  File "utils/generate.py", line 294, in <module>
    get_providers()
  File "utils/generate.py", line 99, in get_providers
    table = div_inner.find_all('table')[0]
IndexError: list index out of range

I think the reason is the Terraform document page seems to have been updated and restructured DOM. I tried to repair this script, but it is difficult for me because I don't know that page before update.

Could you please repair it to be executable?

rafalcieslak commented 4 years ago

I'd love to, but it's likely to be a lot of effort in vain.

Generally, gathering the list of resources and attributes by parsing documentation web pages is a terrible idea. The documentation is human-readable, and lacks a well-defined structure. Various providers use different formatting styles and different document structure. Which makes the documentation a terrible source for automated processing. Thus generate.py is full of wierd unmaintainable tricks, specific rules, handling exceptions and very specific cases, in an attempt to extract the structure of these documents. Completely bonkers idea.

But to be fair, terrible as it is, this seems to be the only possible way of extracting resource lists, their attributes and their descriptions. Some most prominent providers might have a more structured or consistent representation of their resources and attributes, but as long as some providers don't do that, this isn't a strategy worth pursuing. There are many providers that get updated very rarely, and I can't expect them all to adapt some shared machine-readable documentation format - though I believe this is an effort that should be pursued by terraform developers.

With that in mind, generate.py was never meant to be easily maintainable. Even small changes in docs format require a lot of fixes in this script; since a lot of documentation styles have changed since I originally wrote it, I believe at this point it would be easier to rewrite the script entirely rather than fix it... which obviously requires a lot of tedious work. Eventually I will have to do that, though!

I have also considered borrowing the data file from some other terraform plugin for other editors, but back when I last researched them, none of their data sets was comprehensive enough. I should revisit this idea.

rirze commented 4 years ago

@rafalcieslak Have you seen this repository maintained for the IntelliJ plugin? Seems like it could fit right in with some changes to the parsing?

This way you don't have to maintain your own parsing tool.

rafalcieslak commented 4 years ago

Hi @rize, this is an awesome suggestion! If there was indeed a repository that gathered resources/attributes/functions and other terraform keywords, I would be very happy to use it instead of maintaning my own tool. That terraform-metadata repository (or it's clones I've seen present in other editors' plugins) is a step in the right direction, but it's not quite there yet. Most importantly, it does not provide docstrings for any resource, which are available when using company-terraform. This is understandable, because documentation is not embedded within provider sources, it exists solely as html/markdown files in providers' repository - making it difficult to extract it automatically. Additionally, terraform-metadata is only partially automated, note that backends or interpolation functions list cannot be automatically updated, and you will see that, for instance, it is missing all functions introduced by terraform 0.12 back in May.

So while I really hate having to maintain my own tool, I feel that, at this point, switching the schema source from a web-scrapper to terraform-metadata would be replacing one half-arsed tool with another. I hope this will change in the future, but I don't imagine this happening without a lot of work done between terraform providers to standardize the documentation format.

rirze commented 4 years ago

Hi @rafalcieslak, thanks for the in-depth explaination! I really appreciate it.

I started scouring for other tools that have documentation and it looks like VSCode's plugin might have what you're looking for? It's important to note that the provider list is hopelessly incomplete, only having the most commonly used 6 providers (aws, azure, gcloud, etc...).

On that note, it might be better only supporting a subset of providers till Hashicorp comes out with a full API specification.