How can I help with 3D points and images?

bw4sz commented 3 years ago

Hi all, thanks for your great work. I'm a researcher at University of Florida studying deep learning for biological applications (this kind of stuff: https://deepforest.readthedocs.io/). I like the paradigm you've put forward. This is the kind of thing that motivated me to make the switch from tensorflow.

The paper says

At the time of writing, superpointbased [25, 24] and multi-modal methods (eg. 3D points +
images [11]) are not yet implemented. However, we plan to
add both in the near-future.

What is the status of multi-modal models? What models/datasets do you hope to use? I've got a benchmark dataset for tree detection in LiDAR + Hyperspectral + RGB that i'm publishing (https://github.com/weecology/NeonTreeEvaluation, https://www.biorxiv.org/content/10.1101/2020.11.16.385088v1). I'm starting to build joint models and wanted to ask how I can help contribute and get involved here. If you set me out some general guidelines and status I'd love to contribute for more reproducibility.

I'm just checking out the repo now and I'll train a couple models to see it generalizes to tree detection. Our point density is very limited compared to traditional benchmarks. See some samples: http://tree.westus.cloudapp.azure.com/trees/

nicolas-chaulet commented 3 years ago

Hi @bw4sz I love that! I know we have a couple of people interested in that sort of things in the community so I'll ping them. The starting point is to create a custom dataset together with a collate transform, once that's in place you should be able to port your model over pretty easily. Let's wait to see if someone has already built something around that and if not I can give more guidance. Thanks for the interest!

loicland commented 3 years ago

Hi,

We are currently working on a 2D+3D joint model based on sparse-conv. The context is a bit different than yours since we are working with indoor/outdoor datasets, in which the image projection is as straightforward as for aerial images. However I am sure it could be adapted to this simpler case easily. In short, our new data format needs a mapping from point to pixels, which should be pretty straightforward to obtain.

The focus of your dataset is instance segmentation of trees, correct?

bw4sz commented 3 years ago

yup. I was just starting to format a dataset object for this repo. @nicolas-chaulet let me know if there is a preferred input type.

I converted our .las LiDAR files to a headerless txt with the format

X,Y,Z,Intensity, Label

where the Label is a integer 1....n, for each individual tree. On first scan here

https://torch-points3d.readthedocs.io/en/latest/src/tutorials.html#create-a-dataset-that-the-framework-recognises

It wasn't super obvious where the labels go in the parser, but I didn't yet look at the detection/ folder examples. I can make a separate issue if you prefer, but we can work through that as an example and then I can do a pull request to add a bit of demo.

@loicland I remember your gated messenger networks paper with the superpoints group. The major difference with these datasets is the weaker annotation completeness (annotations are projected from RGB) and lower point density. Here is an example from my semi-supervised RGB retinanet, projected into the point cloud. No geometric learning yet.

Happy to contribute in anyway, i'm glad to see the community start to coalesce. I am transitioning out of tensorflow/keras for this reason.

bw4sz commented 3 years ago

@nicolas-chaulet I can open other issues, but returning to this today. I'm not 100% sure I understand the API model. Let's say I have 100+ 'scans' of a plot, like the image above, in a folder, each in a .csv.

e.g.

BLAN_009.txt

"X","Y","Z","Intensity","label"
763156.94,4330837.93,3.37,0,0
763156.22,4330838.09,3.58,0,0
763155.49,4330838.25,3.72,0,0
763154.82,4330838.39,4.04,0,0

where label is unique object identifier (integer). Each row is a point. Is this the correct format for object detection?

Can you confirm the desired workflow?

Fork this repo? Or pip install and then subclass the classes below?
Create a new geometric dataset class that loads a given scan (as in https://github.com/nicolas-chaulet/torch-points3d/blob/9966f2350e03165158ba40b9f203aae7a16d31aa/torch_points3d/datasets/segmentation/scannet.py#L629)
Create a new pytorch 3d datatset (https://pytorch-geometric.readthedocs.io/en/latest/notes/create_dataset.html)
This dataset class would have read_scan method that opens a given file (pd.read_csv) and then loads a torch tensor? Should each scan be batched? They aren't that large.
Create a new config file? https://torch-points3d.readthedocs.io/en/latest/src/tutorials.html?highlight=dataset#create-a-new-configuration-file. I don't yet want to create a new model, just run an existing object detection model (https://github.com/nicolas-chaulet/torch-points3d/blob/9966f2350e03165158ba40b9f203aae7a16d31aa/torch_points3d/models/object_detection/votenet2.py). This relates to fork versus pip install, looking at the docs you are using poetry to run a train.py from the top dir,
Run an existing model against this new dataset

poetry run python train.py task=object_detection model_name=votenet dataset=myNewDataset

but then the repo is also a python package? Maybe just if users want the datasets?

import torch
from torch_geometric.data import InMemoryDataset

import glob
import pandas as pd

class Crowns(InMemoryDataset):
    def __init__(self, root, transform=None, pre_transform=None):
        super(Crowns, self).__init__(root, transform, pre_transform)
        self.root = root
        self.data, self.slices = torch.load(self.processed_file_names[0])

    @property
    def raw_file_names(self):
        return glob.glob("{}/*.txt".format(self.root))

    @property
    def processed_file_names(self):
        return ['data.pt']

    def download(self):
        # Download to `self.raw_dir`.
        pass

    def read_plot(self, plotID):
        """Read a scan from the master list"""
        df = pd.read_csv("{}/{}".format(self.root,plotID))

        return df

    def process(self):

        # Read data into huge `Data` list.
        data_list = [self.read_plot(x) for x in self.raw_file_names()] 

        if self.pre_filter is not None:
            data_list = [data for data in data_list if self.pre_filter(data)]

        if self.pre_transform is not None:
            data_list = [self.pre_transform(data) for data in data_list]

        data, slices = self.collate(data_list)
        torch.save((data, slices), self.processed_file_names[0])

nicolas-chaulet commented 3 years ago

So, it depends if you have some training scripts already. If not, then fork + using the train.py is the best option.

The dataset you have posted looks good, you would have to wrap you dataframes into a ptyorch geometric Data object though with pos set to (x,y,z) and x to intensity. For object detection and for votenet in particular you need to work some more to extract bounding boxes and other things, take a look at scannet there for an example: https://github.com/nicolas-chaulet/torch-points3d/blob/master/torch_points3d/datasets/object_detection/scannet.py The list of required labels is declared there: https://github.com/nicolas-chaulet/torch-points3d/blob/9966f2350e03165158ba40b9f203aae7a16d31aa/torch_points3d/models/object_detection/votenet.py#L22

Once that is done, you would just have to wrap that into a tp3d BaseDataset to define your train vs val split. A BaseDataset will handle batching for you and create the data loaders automatically. It also handles the logic for creating the data augmentation transforms that you might have defined in you myNewDataset.yaml file.

I hope this makes sense!

CCInc commented 2 years ago

I'm going to close for now, let me know of any updates!

torch-points3d / torch-points3d

How can I help with 3D points and images? #549