voldemortX / pytorch-auto-drive

PytorchAutoDrive: Segmentation models (ERFNet, ENet, DeepLab, FCN...) and Lane detection models (SCNN, RESA, LSTR, LaneATT, BézierLaneNet...) based on PyTorch with fast training, visualization, benchmarking & deployment help
BSD 3-Clause "New" or "Revised" License
835 stars 138 forks source link

Roadmap #4

Open voldemortX opened 3 years ago

voldemortX commented 3 years ago

Roadmap for our users (to state feature requests) and contributors.

* Low priority tasks.

2022Q? (2022.4.2 - ):

2022Q1 (2022.1.10 - 2022.3.31):

2021Q4 (2021.10.1 - 2021.12.31):

2021Q3 (2021.7.1 - 2021.9.30):

2021Q2 (2021.4.1 - 2021.6.30):

2021Q1 (-2021.3.31):

voldemortX commented 3 years ago

The maintainers have rather low bandwidth these days, about half features for Q1 remain unfinished and are pushed to Q2. Any help would be much appreciated!

SikandAlex commented 2 years ago

I am interested in helping to support this library. I am in a position to add support for comma10k as well as GAN-based weather augmentations soon. I have previously trained this on BDD100K: https://github.com/hustvl/YOLOP

I really do not recommend this architecture because despite seeming very attractive, being very flexible in training, and learning all the tasks well, I could not convert it to TensorRT. I opened issues both there and in the tensorrtx repository but the HKUST students did not help me at all, and many files were missing from their repository.

See here:

https://github.com/hustvl/YOLOP/issues/12 https://github.com/wang-xinyu/tensorrtx/issues/793

I would also like to move these two papers into the Roadmap:

https://arxiv.org/abs/2105.05003 CondLaneNet https://arxiv.org/abs/2105.13680 FOLOLane

I am a MS in AI Candidate at Boston University (taking semester off because courses are useless), I see that you are a student at SJTU @voldemortX I am a huge fan of your university and have read many papers from there. I regard them as the #1 world leader's in many computer vision application fields particularly surveillance. I would enjoy working with you.

Right now I have trained DDRNet for real-time semantic segmentation on comma10k dataset. I think comma10k dataset has huge use for the community because it is fully permissive so we can augment it with labels it is missing / new formats etc. I will update when I can submit some code, I will release it slightly after I build it into my pipeline to give my company a slight edge before I make it open-source. I do not have much experience with pull requests but I will do my best.

voldemortX commented 2 years ago

@SikandAlex It will be an honor to have your help as well!

I am interested in helping to support this library. I am in a position to add support for comma10k as well as GAN-based weather augmentations soon. I have previously trained this on BDD100K: https://github.com/hustvl/YOLOP

Any new supports on datasets is welcomed!

I would also like to move these two papers into the Roadmap:

https://arxiv.org/abs/2105.05003 CondLaneNet https://arxiv.org/abs/2105.13680 FOLOLane

The CondLaneNet is open-sourced and could be easier to implement. While FOLOLane might prove a harder method that need more work, since we do not yet have one of its backbones (BiSeNet).

I'll add them in the Roadmap for Q4 and they can of course continue into 22. You can submit PRs whenever you have a ready-to-go bunch of codes (e.g. implemented one of the dataset class and tested its loading, or finished an algorithm). Thanks again for your help!

voldemortX commented 2 years ago

TensorRT support is also what @cedricgsh and I have talked about recently. We too agree that pytorch-auto-drive should not stop at a research codebase. Our primal aim would be a TensorRT benchmark for model speed and op-based FLOPs calculation from fvcore. But given our current bandwidth, I think that would need to wait until 22Q1 at least (a refactor of the framework might be required).

SikandAlex commented 2 years ago

I have an AGX Xavier on hand that I will hopefully be able to provide some benchmarks on for certain models. Unfortunately I'm no expert at TensorRT custom layers etc and some operations it seems are unable to be supported by many developers.

There is so many papers that claim to get certain FPS on deploy to embedded GPUs but they never release their code, they only release testing code and no training code, the results are not reproducible even if there is training code, so really it is a huge mess. I have spent the past 2 months trying to determine the best papers and best approach as of Fall 2021 given my computational limitations.

After reading as many papers as I can over the past 3 months, driving around in my friend's Tesla, seeing many other models (amazing AI research from Asia is just destroying us here in US in my opinion), there are not that many critical components to Level 2 highway autonomy (which should be the first step).

1) Object Detection for Vehicles/Pedestrians

2) Segmentation of Driveable Surface / Road

After much research I narrowed down the candidate models to the following:

3) Lane Detection (Segmentation with Post-Processing OR Polynomial/Keypoint/Row-Wise/Transformer/Non-Segmentation etc)

I have found that real-time segmentation approaches in regard to lane markings need to incorporate high-resolution features and not resize the input image before doing anything as many networks do. This is because the semantics of the lane lines beyond a certain distance from the vehicle are lost at lower resolution and then you can't predict the path far enough in advance. I also do not know how to post-process the segmentation based approach properly as I have not yet experimented with DBSCAN or RANSAC. From all my readings, this shows the most potential to me but it seems to require the 4-lane probability output rather than just the binary mask for lane segmentation I am currently producing:

https://github.com/czming/RONELD-Lane-Detection https://arxiv.org/abs/2010.09548

When I tested the models at https://github.com/Turoad/lanedet#Benchmark-and-model-zoo the CondLaneNet model performed the best which is why I recommended it be placed on the road map although it uses a heavy Resnet-101 backbone.

4) Depth (Monocular/Stereo, ideally monocular)

6) Bird's Eye View / Top View / Projection Transform

Finally, this is the best Github project that I've been able to find related to self-driving. All the models seem to come from OpenVINO / PINO model library.

https://github.com/iwatake2222/self-driving-ish_computer_vision_system

I'm not sure what this model is trained on, or what architecture, but it seems to work well? https://docs.openvino.ai/2018_R5/_docs_Transportation_segmentation_curbs_release1_caffe_desc_road_segmentation_adas_0001.html

On the control side of things, MPC solver is solution to use for latitude/longitude control.

This is the extensive information I have been able to collect. I automatically discarded models that didn't have code implementations but I think I could have made a mistake here or there. I was going to keep all this information to myself but all I do is code non-stop all day and have no life and still only make slow progress. My friends just work at company's like Facebook and Google and don't wan't to work on anything exciting. Impossible to get them to give up stock to work with me, also has a learning curve. So hopefully by giving back to the open-source community, they will give back to me and we can all improve science and also make money.

voldemortX commented 2 years ago

Well my knowledge about self-driving is kind of only in the research stage for now. And I certainly learned a lot from your comments. Though I'm still skeptical about deep learning's performance in actual applications of self-driving, especially on RGB inputs. So now what I do in SenseTime is more about human-centric applications.

Btw, we'll release our own lane detector later, perhaps early 22 at the latest. It end-to-end achieves a reasonable performance (~75 CULane, ~95 LLAMAS) at 150 FPS in PyTorch with a small model, which we believe could be beneficial to application. But it needs to remain in a private repo for now.

voldemortX commented 2 years ago

@SikandAlex We have now pushed initial supports for ONNX and TensorRT conversions, maybe they can be helpful to your applications? Refer to DEPLOY.md.